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multi-user  communication  systems,  including  multi-channel  power  control,  flow 
control,  and  wireless  random  access. 
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CHAPTER  1 


Introduction 

Mathematical  communications  theory  that  started  with  Shannon’s  seminal  pa¬ 
per  “A  mathematical  theory  of  communication”  is  a  fairly  young,  but  rapidly 
maturing  science  just  over  sixty  years  old.  Shannon’s  original  work  focused  on 
communication  scenarios  between  a  single  transmitter  and  receiver  pair  [Sha48]. 
These  communication  models  are  referred  to  as  single-user  channels  for  which  the 
capacities  are  now  well-investigated.  Practical  communication  systems  are  inher¬ 
ently  competitive  environments,  where  multiple  transmitters  and  receivers  exe¬ 
cuting  a  variety  of  applications  and  services  share  the  same  transmission  medium 
and  compete  for  limited  network  resources.  As  opposed  to  its  single-user  counter¬ 
part,  the  characterization  of  multi-user  environments  is  much  more  complicated. 
This  is  because  in  resource-constrained  communication  networks,  a  user’s  util¬ 
ity  is  usually  not  only  affected  by  its  own  action  but  also  by  the  actions  taken 
by  all  the  other  users  sharing  the  same  resources.  Due  to  the  mutual  coupling 
among  users,  the  performance  optimization  of  multi-user  communication  systems 
becomes  quite  challenging. 

1.1  Game  Theory  in  Multi-user  Communication 

Game  theory  provides  a  formal  framework  for  describing  and  analyzing  the  inter¬ 
actions  of  multiple  decision  makers.  Recently,  there  has  been  a  surge  in  research 


1 


activities  that  adopt  game  theoretic  tools  to  investigate  a  wide  range  of  prob¬ 
lems  in  multi-user  communication  theory,  such  as  flow  and  congestion  control, 
network  routing,  load  balancing,  power  control,  peer-to-peer  content  sharing, 
etc  [ABE06,  MW01,  FH06,  NRT07,  JLL09].  The  majority  of  the  existing  game 
theoretic  research  works  formalize  the  multi-user  interactions  in  various  commu¬ 
nication  scenarios  as  a  strategic  game,  which  is  a  suitable  model  for  the  analysis 
of  a  game  where  all  users  act  independently  and  simultaneously  according  to  their 
own  self-interests  and  a  priori  knowledge  of  the  other  users’  strategies.  This  can 
be  formally  defined  as  a  tuple 


r={N’,A,u).  (1.1) 

In  particular,  J\f  =  (1,2,  is  the  set  of  communication  devices,  which 

are  the  rational  decision  makers  in  the  system.  Define  A  to  be  the  joint  ac¬ 
tion  set  A  =  xnej\fAn,  with  An  C  7 Zh  being  the  action  set  available  for  user 
n.  The  vector  utility  function  u  =  xn&j^un  is  a  mapping  from  the  individ¬ 
ual  users’  joint  action  set  to  real  numbers,  i.e.  u  :  A  — >  1ZN .  In  particular, 
wn(a)  :  A  — >  1Z  is  the  utility  of  the  nth  user  that  generally  depends  on  the 
strategies  a  =  (an,  a_n)  of  all  users,  where  an  e  An  denotes  a  feasible  action 
of  user  n,  and  a„n  =  xm^nam  is  a  vector  of  the  actions  of  all  users  except  n. 
We  also  denote  by  A-n  =  xm^nAm  the  joint  action  set  of  all  users  except  n. 
To  capture  the  multi-user  performance  tradeoff,  the  utility  region  is  defined  as 
U  =  {(«i(a), . . . ,  rtjv(a))  |  3  a  =  (a1;  a2, . . . ,  a^)  G  .4.}.  Depending  on  the  char¬ 
acteristics  of  different  applications,  numerous  game-theoretical  models  have  been 
proposed  to  characterize  the  multi-user  interactions  and  optimize  the  users’  de¬ 
cisions  in  communication  networks.  A  variety  of  game  theoretic  solutions,  such 
as  Nash  equilibrium  (NE)  and  Pareto  optimality  [FT91],  have  been  developed  to 
characterize  the  resulting  performance  of  the  multi-user  interactions.  Depend- 
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ing  on  the  feasibility  of  real-time  information  exchange  among  users,  significant 
research  efforts  have  been  devoted  in  the  literature  to  constructing  operational 
algorithms  in  order  to  achieve  NE  and  Pareto  optimality  in  various  games  with 
special  structures  of  action  set  An  and  utility  function  un. 

1.1.1  Nash  Equilibrium 

To  avoid  the  overhead  associated  with  exchanging  information  between  users  in 
real-time,  network  designers  may  prefer  fully  decentralized  solutions  in  which 
the  participating  users  simply  compete  against  other  users  by  choosing  actions 
an  £  An  to  selfishly  maximize  their  individual  utility  functions  un(an,  a_n),  given 
the  actions  a_n  €  A-n.  Most  of  these  approaches  focus  on  investigating  the 
existence  and  properties  of  NE. 

Definition  1.1  A  profile  a  of  actions  constitutes  a  Nash  equilibrium  of  T  if 
*^—n)  —  ^n(^rn  cl_n)  fOT  all  ari  £  An . 

At  NE,  given  the  other  users’  actions,  no  user  can  increase  its  utility  alone  by 
changing  its  action.  For  an  extensive  discussion  of  the  methodologies  studying 
the  existence,  uniqueness,  and  convergence  of  various  equilibria  in  communication 
networks,  we  refer  the  readers  to  [LDA09]. 

For  example,  to  establish  the  existence  of  and  convergence  to  a  pure  NE,  we 
can  examine  whether  A  and  u  satisfy  the  conditions  of  concave  games,  super- 
modular  game,  potential  game,  etc.  Specifically,  to  apply  the  existence  result  of 
a  pure  NE  in  concave  games  [FT91,  Ros65],  we  need  to  check  the  following  con¬ 
ditions:  i)  each  player’s  action  set  An  is  convex  and  compact;  and  ii)  the  utility 
function  un( an,  a_n)  is  continuous  in  a  and  quasi-concave1  in  an  for  any  fixed 

1A  real-valued  function  /  is  quasi-concave  if  dom /  is  convex  and  {a;  £  dom/|/(a;)  >  a}  is 
convex  for  all  a. 
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a_n.  As  additional  examples  of  games  that  guarantee  the  convergence  to  NE,  it 
is  well-known  that,  in  supermodular  games  [Top98,  AA03]  and  potential  games 
[MS96,  SBP06],  the  best  response  dynamics  can  be  used  to  search  for  a  pure  NE. 
Suppose  that  utility  function  un  is  twice  continuously  differentiable,  Vn  G  A f.  If 
An  is  a  compact  subset  of  1Z  (or  more  generally  An  is  a  nonempty  and  compact 
sublattice2  of  TZK),  Vn  G  A/”,  establishing  that  game  V  is  a  supermodular  game  is 
equivalent  to  showing  that  un  satisfies 


(fill 

V(m,  n)  G  A/”2,  m  7^  n,  — — >  0. 

dandam 


(1.2) 


If  action  set  A  in  game  T  is  an  interval  of  real  numbers,  we  can  show  that  game 
T  is  a  potential  game  by  verifying 


(1.3) 


1.1.2  Pareto  Optimality 

It  is  important  to  note  that  operating  operating  at  a  Nash  equilibrium  will 
generally  limit  the  performance  of  the  user  itself  as  well  as  that  of  the  entire 
network,  because  the  available  network  resources  are  not  always  effectively  ex¬ 
ploited  due  to  the  conflicts  of  interest  occurring  among  users.  As  opposed  to 
the  NE-based  approaches,  there  exists  a  large  body  of  literature  that  focuses  on 
studying  how  users  can  jointly  improve  the  system  performance  by  optimizing 
a  certain  common  objective  function  /(«i(a),  rt2(a), . . . ,  rqv(a)).  This  function 
represents  the  fairness  rule  based  on  which  the  system-wide  resource  allocation  is 
performed.  Different  objective  functions,  e.g.  sum  utility  maximization  in  which 
/(■Ui(a),  w2(a),  •  •  •  ,WAr(a))  =  J2n=iun( a),  can  provide  reasonable  allocation  out¬ 
comes  by  jointly  considering  fairness  and  efficiency.  A  profile  of  actions  is  Pareto 

2A  real  AT-dimensional  set  V  is  a  sublattice  of  1ZK  if  for  any  two  elements  a,  b  £  V,  the 
component-wise  minimum,  a  A  6,  and  the  component-wise  maximum,  a  V  6,  are  also  in  V. 
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optimal  if  there  is  no  other  profile  of  actions  that  makes  every  user  at  least  as 
well  off  and  at  least  one  user  strictly  better  off. 

The  majority  of  these  approaches  focus  on  studying  how  to  efficiently  or  dis- 
tributedly  find  the  optimum  joint  policy.  There  exists  a  large  body  of  literature 
that  investigates  how  to  compute  Pareto  optimal  solutions  in  large-scale  net¬ 
works  where  centralized  solutions  are  infeasible.  Numerous  convergence  results 
have  been  obtained  for  various  generic  distributed  algorithms.  An  important 
example  is  the  NUM  framework  that  develops  distributed  algorithms  to  solve 
network  resource  allocation  problems  [CLC07].  The  majority  of  the  results  in 
the  existing  NUM  literature  are  based  on  convex  optimization  theory,  in  which 
the  investigated  problems  share  the  following  structures:  the  objective  function 
/(wi(a),  u2(a), . . . ,  wjv(a))  is  convex3,  inequality  resource  constraint  functions  are 
convex,  and  equality  resource  constraint  functions  are  affine.  It  is  well-known 
that,  for  convex  optimization  problems,  users  can  collaboratively  exchange  price 
signals  that  reflect  the  “cost”  for  consuming  the  constrained  resources  between 
each  other  and  the  Pareto  optimal  allocation  that  maximizes  the  network  utility 
can  be  determined  in  a  fully  distributed  manner  [PC06]. 

Summarizing,  these  general  structural  results  without  and  with  real-time  mes¬ 
sage  exchange  turn  out  to  be  very  useful  when  analyzing  various  multi-user  inter¬ 
actions  in  communication  networks.  The  majority  of  the  existing  game  theoretic 
research  works  in  communication  networking  applications  usually  depend  on  these 
specific  structures  and  inter-user  coupling  of  their  action  sets  and  utility  func¬ 
tions.  By  considering  or  even  architecting  these  specific  structures,  the  associated 
games  become  analytically  tractable  and  possess  various  important  convergence 
properties.  Numerous  existing  works  are  devoted  to  constructing  or  shaping  the 

3/  :  R"  — >  R  is  convex  if  dom /  is  a  convex  set  and  f(dx  +  (1  —  0)y)  <  Of(x)  +  (1  —  0)f(y), 
Vx,  y  €  dom/,  0  <  6  <  1. 
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multi-user  coupling  such  that  it  fits  into  these  frameworks  and  the  corresponding 
generic  solutions  can  be  directly  applied. 

1.2  Motivation 

The  rapid  increase  in  the  demand  for  data  rate  over  wired  and  wireless  communi¬ 
cation  networks  has  led  to  a  rethinking  of  the  traditional  network  architecture  and 
design  principles.  In  fact,  communication  systems  are  inherently  informationally 
decentralized  competitive  environments,  where  multiple  devices  executing  a  vari¬ 
ety  of  applications  and  services  need  to  locally  adapt  their  transmission  strategies 
based  on  their  available  information  and  compete  for  scarce  networking  resources. 
The  concepts  and  techniques  that  have  dominated  multi-user  communication  re¬ 
search  in  recent  years  are  not  well  suited  for  these  informationally  decentralized 
environments.  Specifically,  most  existing  research  has  focused  on  two  extreme 
multi-user  interaction  scenarios: 

•  the  complete  information  scenario  with  a  common  system-wide  objective. 
This  scenario  assumes  either  that  all  participating  users  transmit  their  private 
information  to  a  trusted  moderator  or  peer  (e.g.  access  point,  base  station,  se¬ 
lected  network  leader  etc.),  to  which  it  is  given  the  authority  to  fairly  divide  the 
wireless  resources  among  the  participating  users  or  that,  in  a  distributed  environ¬ 
ment,  users  exchange  information  between  each  other  such  that  the  information 
required  for  achieving  the  common  objective  is  obtained.  These  solutions  can 
lead  to  Pareto  efficient  allocations  at  the  cost  of  a  large  amount  of  information 
exchange. 

•  the  private  information  scenario  with  conflicting  objectives,  where  selfish 
users  interact  by  assuming  no  or  limited  information  about  each  other  (e.g.  infor- 
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mation  about  the  other  users’  channel  or  traffic  characteristics)  ,  usually  resulting 
in  inefficient  solutions  such  as  NE. 

Both  aforementioned  communications  scenarios  encourage  a  passive  partici¬ 
pation  of  the  users  in  the  multi-user  interaction,  because  they  assume  that  users 
cannot  proactively  influence  the  resource  division.  However,  such  multi-user  net¬ 
work  designs  do  not  take  advantage  of  the  transceivers’  “smartness”  (i.e.  their 
ability  to  acquire  information,  learn  and  reason  about  their  opponents).  Interest¬ 
ingly,  these  passive  interactions  among  users  may  lead  in  practice  to  inefficient 
resource  usage.  Even  in  fully  collaborative  communications  scenarios,  some  users 
may  not  be  able  to  exchange  certain  desired  information.  This  is  because  they 
are  not  able  to  make  optimal  decisions  on  which  information  to  exchange  due 
to  their  bounded  rationality  (e.g.  limited  memory  or  complexity  constraints), 
or  because  they  cannot  transmit  their  complete  private  information  due  to  their 
communication  constraints. 

1.3  Overview  of  Dissertation 

The  objective  of  this  dissertation  is  to  characterize  users’  optimal  strategies  to  im¬ 
prove  their  performance  subject  to  varying  degrees  of  informational  constraints  in 
several  classes  of  multi-user  communication  environments.  We  will  mainly  focus 
on  fully  distributed  solutions  without  any  real-time  information  exchange  between 
different  users,  which  perfectly  satisfy  the  informationally  efficient  requirement 
in  communication  systems.  In  particular,  to  achieve  the  coordination  purpose 
without  real-time  information  exchange,  we  will  fully  explore  the  structures  of 
the  investigated  inter-user  coupling,  enable  the  devices  to  proactively  accumu¬ 
late  information  via  observed  outcomes  of  the  historical  interactions  with  other 
devices  and,  based  on  this  information,  build  their  beliefs  about  the  competing 
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devices  and  the  environment  to  optimize  their  transmission  strategies. 

In  order  to  capture  the  characteristics  of  mutual  coupling  among  multiple 
users  and  classify  various  communication  games,  we  need  to  introduce  two  new 
elements  S  and  s  into  the  traditional  strategic  game  formulation  [FT91]  and 
redefine  the  multi-user  game  in  various  communication  scenarios  as  a  tuple 

r  =  (Af,A,u,S,s).  (1.4) 

Specifically,  S  is  the  state  space  S  =  xnexSn,  where  Sn  is  the  part  of  the  state 
relevant  to  user  n.  The  state  is  defined  to  capture  the  effects  of  the  multi-user 
coupling  such  that  each  user’s  utility  solely  depends  on  its  own  state  and  action. 
In  other  words,  the  utility  function  u  =  X  n^\[Un  is  a  mapping  from  the  individual 
users’  state  space  and  action  space  to  real  numbers,  un  :  Sn  x  An  — >  1Z.  The 
state  determination  function  s  =  xn£ j^sn  maps  joint  actions  to  states  for  each 
component  sn  :  A-n  — >  Sn  in  which  A-n  =  xm^nAm.  Note  that  traditional 
strategic  games  simply  assume  Sn  =  A-n. 

Based  on  the  formulation  above,  we  derive  several  important  structural  results 
for  several  special  classes  of  multi-user  interaction  scenarios  in  communication 
networks  under  the  informationally  efficient  constraint.  In  particular,  we  want 
to  investigate  three  key  problems  in  information-constrained  multi-user  commu¬ 
nication  systems: 

Question  1 :  When  will  a  distributed  algorithm  (e.g.  best  response  dynam¬ 
ics)  converge  to  a  NE?  And  how  fast? 

Question  2:  If  information  is  constrained  and  no  information  exchange 
between  users  is  allowed,  how  to  improve  an  inefficient  NE  without  message 
passing? 

Question  3:  Assuming  no  real-time  information  exchange  between  users, 


can  we  still  achieve  Pareto  optimality  ? 

First  of  all,  to  address  Question  1,  Chapter  2  proposes  and  analyzes  a  broad 
family  of  games  played  by  resource-constrained  players,  which  are  referred  to  as 
Additively  Coupled  Sum  Constrained  Games  (ACSCG)  and  are  characterized  by 
the  following  central  features:  1)  each  user  has  a  multi-dimensional  action  space 
Anj  subject  to  a  single  sum  resource  constraint;  2)  user  n’s  utility  in  a  particu¬ 
lar  dimension  k  depends  on  an  additive  coupling  between  user  n’s  action  in  the 
same  dimension  and  a  state  determined  by  the  actions  of  the  other  users;  and  3) 
each  user’s  total  utility  un  is  the  sum  of  the  utilities  obtained  in  each  dimension. 
Familiar  examples  of  such  multi-user  environments  in  communication  systems  in¬ 
clude  power  control  over  frequency-selective  Gaussian  interference  channels  and 
flow  control  in  Jackson  networks.  In  settings  where  users  cannot  exchange  mes¬ 
sages  in  real-time,  we  study  how  users  can  adjust  their  actions  based  on  their 
local  observations.  We  derive  sufficient  conditions  under  which  a  unique  Nash 
equilibrium  exists  and  the  best-response  algorithm  converges  globally  and  lin¬ 
early  to  the  Nash  equilibrium.  In  settings  where  users  can  exchange  messages  in 
real-time,  we  focus  on  user  choices  that  optimize  the  overall  utility  in  distributed 
manner.  We  provide  the  convergence  conditions  of  two  distributed  action  update 
mechanisms,  gradient  play  and  Jacobi  update. 

As  the  first  step  to  address  Question  2,  Chapter  3  considers  the  problem 
of  how  to  allocate  power  among  competing  users  sharing  a  frequency-selective 
interference  channel.  The  multi-channel  power  control  game  is  a  special  case  of 
ACSCG  in  which  user  n’s  state  represents  its  experienced  interference  in  different 
channels.  We  model  the  interaction  between  selfish  users  as  a  non-cooperative 
game.  As  opposed  to  the  existing  iterative  water-filling  algorithm  that  studies 
the  myopic  users,  this  chapter  studies  how  a  foresighted  user,  who  knows  the 
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channel  state  information  and  response  strategies  of  its  competing  users,  should 
optimize  its  transmission  strategy.  To  characterize  this  multi-user  interaction, 
the  Stackelberg  equilibrium  is  introduced,  and  the  existence  of  this  equilibrium 
for  the  investigated  non-cooperative  game  is  shown.  We  analyze  this  interaction 
in  more  detail  using  a  simple  two-user  example,  where  the  foresighted  user  de¬ 
termines  its  transmission  strategy  by  solving  as  a  bi-  level  program  which  allows 
him  to  account  for  the  myopic  user’s  response.  It  is  analytically  shown  that  a 
foresighted  user  can  improve  its  performance,  if  it  has  the  necessary  informa¬ 
tion  about  its  competitors.  Since  the  optimal  solution  of  Stackelberg  equilibrium 
is  computationally  prohibitive,  we  propose  a  practical  low-complexity  approach 
based  on  Lagrangian  duality  theory.  Surprisingly,  numerical  simulations  show 
that,  in  most  of  the  simulation  settings,  the  Stackelberg  equilibrium  results  in 
higher  rates  for  both  the  foresighted  and  the  myopic  users. 

To  further  address  Question  2,  Chapter  4  discusses  how  a  foresighted  user  in 
multi-channel  power  control  games  can  acquire  its  desired  information  by  mod¬ 
eling  its  experienced  interference  as  a  function  of  its  own  power  allocation.  To 
characterize  the  outcome  of  the  multi-user  interaction,  the  conjectural  equilib¬ 
rium  is  introduced,  and  the  existence  of  this  equilibrium  for  the  investigated 
power  control  game  is  proved.  Interestingly,  both  the  Nash  equilibrium  and  the 
Stackelberg  equilibrium  are  shown  to  be  special  cases  of  the  generalization  of  con¬ 
jectural  equilibrium.  We  also  develop  practical  algorithms  to  form  accurate  be¬ 
liefs  and  search  desirable  power  allocation  strategies.  We  show  that  a  foresighted 
user  without  any  a  priori  knowledge  of  its  competitors’  private  information  can 
effectively  learn  the  required  information  through  repeated  interaction  with  its 
competitors,  and  induce  the  entire  system  to  an  operating  point  that  improves 
both  its  own  achievable  rate  as  well  as  the  rates  of  the  other  participants  in  the 
power  control  game. 
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Finally,  to  answer  Question  3,  Chapter  5  and  6  discuss  another  special  type 
of  multi-user  communication  scenarios  named  linearly  coupled  games,  in  which 
users’  states  sn  are  linearly  impacted  by  their  competitors’  actions  a_n.  We 
characterize  the  inherent  structures  of  the  utility  functions  un  for  the  linearly 
coupled  games  and  define  two  basic  types  of  linearly  coupled  games.  Both  the 
investigated  Type  I  and  Type  II  games  apply  to  a  variety  of  realistic  applications 
encountered  in  the  multiple  access  design,  including  wireless  random  access  and 
rate  control.  For  both  Type  I  and  Type  II  linearly  coupled  games,  to  improve 
the  inefficient  NE,  we  investigates  the  properties  of  conjectural  equilibrium,  in 
which  individual  users  compensate  for  their  lack  of  information  by  forming  inter¬ 
nal  beliefs  about  their  competitors.  In  both  games,  it  is  analytically  shown  that 
all  the  achievable  operating  points  in  the  throughput  region  are  essentially  sta¬ 
ble  conjectural  equilibria  corresponding  to  different  conjectures.  Moreover,  it  is 
shown  that  the  Pareto  boundaries  of  the  investigated  linearly  coupled  games  can 
be  sustained  as  stable  conjectural  equilibria  without  real-time  information  ex¬ 
change  among  users,  if  the  belief  functions  are  properly  initialized.  Specifically, 
Chapter  5  investigates  Type  II  games  and  analyzes  the  necessary  and  sufficient 
condition  that  guarantees  the  global  convergence  of  the  best  response  and  Ja¬ 
cobi  update  dynamics.  Chapter  6  investigates  Type  I  games  using  the  wireless 
random  access  game  as  an  illustrative  example.  We  enables  nodes  to  proactively 
gather  information,  form  internal  conjectures  on  how  their  competitors  would 
react  to  their  actions,  and  update  their  beliefs  according  to  their  local  obser¬ 
vations.  In  this  way,  nodes  are  capable  to  autonomously  “learn”  the  behavior 
of  their  competitors,  optimize  their  own  actions,  and  eventually  cultivate  reci¬ 
procity  in  the  random  access  network.  Two  distributed  conjecture-based  action 
update  mechanisms,  including  best  response  and  gradient  play,  are  proposed  to 
stabilize  the  random  access  network.  The  sufficient  conditions  that  guarantee  the 
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proposed  conjecture-based  algorithms  to  converge  are  derived.  We  also  investi¬ 
gate  how  the  conjectural  equilibrium  can  be  selected  in  heterogeneous  networks 
and  how  the  proposed  methods  can  be  extended  to  ad-hoc  networks.  Numerical 
simulations  verify  that  the  system  performance  significantly  outperforms  existing 
protocols,  such  as  IEEE  802.11  Distributed  Coordination  Function  (DCF)  pro¬ 
tocol  and  priority-based  fair  medium  access  control  (P-MAC)  protocol,  in  terms 
of  throughput,  fairness,  convergence,  and  stability. 

Chapter  7  summarizes  the  main  points  of  the  dissertation.  In  informationally 
decentralized  multi-user  environment,  by  exploring  the  structures  of  inter- user 
coupling  and  designing  appropriate  belief  functions,  conjectural  equilibrium  based 
solutions  can  achieve  satisfactory  performance  without  any  real-time  information 
exchange  between  users. 
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CHAPTER  2 


Additively  Coupled  Sum  Constrained  Games 

2.1  Introduction 

Power  control  is  one  of  the  first  few  communication  problems  in  which  researchers 
started  to  apply  game  theoretic  tools  to  formalize  the  multi-user  interaction  and 
characterize  its  properties.  An  interesting  and  important  topic  that  has  been 
extensively  investigated  recently  is  how  to  optimize  multiple  devices’  power  al¬ 
location  when  sharing  a  common  frequency-selective  interference  channel.  In 
[YGC02],  Yu  et.  al.  first  defined  such  a  power  control  game  from  a  game-theoretic 
perspective,  proposed  a  best-response  algorithm  in  which  all  users  iteratively  up¬ 
date  their  power  allocations  using  the  water-filling  solution,  and  proved  several 
sufficient  conditions  under  which  the  algorithm  globally  converges  to  a  unique 
pure  NE.  Many  follow-up  papers  further  establish  various  sufficient  convergence 
conditions  with  or  without  real-time  information  exchange  for  power  control  in 
communication  networks  [CSK03,  CHC07,  SPB08,  HBH06,  SBH08].  The  purpose 
of  this  chapter  is  to  introduce  and  analyze  a  general  framework  that  abstracts  the 
common  characteristics  of  this  family  of  multi-user  interaction  scenarios,  which 
includes,  but  is  not  limited  to  the  power  control  scenario.  In  particular,  the  main 
contributions  of  this  paper  are  as  follows. 

First  of  all,  we  define  the  class  of  Additively  Coupled  Sum  Constrained  Games 
(ACSCG),  which  captures  and  characterizes  the  key  features  of  several  communi- 
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cation  and  networking  applications.  In  particular,  the  central  features  of  ACSCG 
are:  1)  each  user  has  a  multi-dimensional  strategy  that  is  subject  to  a  single  sum 
resource  constraint;  2)  each  user’s  payoff  in  each  dimension  is  impacted  by  an 
additive  combination  of  its  own  action  in  the  same  dimension  and  a  function  of 
the  other  users’  actions;  3)  users’  utilities  are  separable  across  different  dimen¬ 
sions  and  each  user’s  total  utility  is  the  sum  of  the  utilities  obtained  within  each 
dimension. 

Second,  based  on  the  feasibility  of  real-time  information  exchange,  we  provide 
the  convergence  conditions  of  various  generic  distributed  algorithms  in  different 
scenarios.  When  no  message  exchanges  between  users  are  possible  and  every 
user  maximizes  its  own  utility,  it  is  essential  to  determine  whether  a  NE  exists 
and  if  yes,  how  to  achieve  such  an  equilibrium.  In  ACSCG,  a  pure  NE  exists 
in  ACSCG  because  ACSCG  belongs  to  concave  games  [FT91,  Ros65].  Our  key 
contribution  in  this  context  is  that  we  investigate  the  uniqueness  of  pure  NE 
and  consider  the  best  response  dynamics  to  compute  the  NE.  We  explore  the 
properties  of  the  additive  coupling  among  users  given  the  sum  constraint  and 
provide  several  sufficient  conditions  under  which  best  response  dynamics  con¬ 
verges  linearly1  to  the  unique  NE,  for  any  set  of  feasible  initialization  with  either 
sequential  or  parallel  updates.  We  also  explain  the  relationship  between  our 
results  and  the  conditions  previously  developed  in  the  game  theory  literature 
[Ros65,  GM80].  When  users  can  collaboratively  exchange  messages  with  each 
other  in  real-time,  we  present  the  sufficient  convergence  conditions  of  two  alter¬ 
native  distributed  pricing  algorithms,  including  gradient  play  and  Jacobi  update, 
to  coordinate  users’  action  and  improve  the  overall  system  efficiency.  The  pro¬ 
posed  convergence  conditions  generalize  the  results  that  have  been  previously 

1A  sequence  x ^  with  limit  x*  is  linearly  convergent  if  there  exists  a  constant  c  £  (0, 1)  such 
that  |adfe)  —  x*\  <  c|adfe_1)  —  x*\  for  k  sufficiently  large  [BV04]. 
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obtained  in  [YGC02,  CSK03,  CHC07,  SPB08,  HBH06,  SBH08]  for  the  multi-user 
power  control  problem  and  they  are  immediately  applicable  to  other  multi-user 
applications  in  communication  networks  that  fulfill  the  requirements  of  ACSCG. 

The  rest  of  this  chapter  is  organized  as  follows.  Section  2.2  defines  the  model 
of  ACSCG.  For  ACSCG  models,  Sections  2.3  and  2.4  present  several  distributed 
algorithms  without  and  with  real-time  information  exchanges,  respectively,  and 
provide  sufficient  conditions  that  guarantee  the  convergence  of  the  proposed  al¬ 
gorithms.  Section  2.5  presents  the  numerical  examples  and  concluding  remarks 
are  drawn  in  Section  2.6. 


2.2  Game  Model  and  Examples 

In  this  section,  we  present  the  definition  of  ACSCG  and  subsequently,  we  present 
several  exemplary  multi-user  scenarios  which  appertain  to  this  new  class  of  game. 


2.2.1  Definition  of  ACSCG 


Definition  2.1  A  multi-user  interaction  V  =  (Af ,  A,  u,  S ,  s)  is  a  ACSCG  if  it 
satisfies  the  following  assumptions: 

Al:  Vn  G  Af,  action  set  An  C  1ZK  is  defined  to  be 2 


A 


n 


K 


an  ^  [a 


n,k 


maxi 
|  an,k  \ 


and 


at<Mn 


k= 1 


(2.1) 


A2:  There  exist  idf  :  1Z  — >  1Z,  ff  :  A-n  — >  1Z,  and  g*  :  A-n  — >  7 Z,  k  — 

2  We  consider  a  sum  constraint  throughout  the  chapter  rather  than  a  weighted-sum  constraint, 
because  a  weighted-sum  constraint  can  be  easily  converted  to  a  sum  constraint  by  rescaling  An . 
Besides,  we  nontrivially  assume  that  EfeLi  an  If  — 
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1, ...  ,K,  such  that 


K 

M  a)  = 

k=  1 


[hkn(<  + St(*-n)) 


(2.2) 


for  all  a  e  A  and  n  G  A f.  hkf)  is  an  increasing,  twice  differentiable,  and 
strictly  concave  function,  fff)  and  gk(-)  are  both  twice  differentiable  functions 
which  correspond  to  the  state  determination  functions  associated  with  user  n  in 
dimension  k. 


The  ACSCG  model  defined  by  assumptions  Al  and  A2  covers  a  broad  class 
of  multi-user  interactions.  Assumption  Al  indicates  that  each  player’s  action 
set  is  a  K- dimensional  vector  set  and  its  action  vector  is  sum-constrained.  This 
represents  the  communication  scenarios  in  which  each  user  needs  to  determine  its 
multi-dimensional  action  in  various  channels  or  networks  while  the  total  amount 
of  resources  it  can  consume  is  constrained.  Assumption  A2  implies  that  each 
user’s  utility  is  separable  and  can  be  represented  by  the  summation  of  concave 
functions  h'f  minus  “penalty”  functions  gk  across  the  K  dimensions.  In  partic¬ 
ular,  within  each  dimension,  the  input  of  I'ff  is  an  additive  combination  of  user 
n’s  action  ak  and  state  determination  function  fk( a_n)  that  depends  on  the  re¬ 
maining  users’  joint  action  a_n.  Since  ajj  only  appears  in  the  concave  function 
hfn,  it  implies  that  each  user’s  utility  is  concave  in  its  own  action,  i.e.  diminish¬ 
ing  returns  per  unit  of  user  n’s  invested  action  an,  which  is  common  for  many 
application  scenarios  in  communication  networks. 

Summarizing,  the  key  features  of  the  game  model  defined  by  Al  and  A2 
include:  each  user’s  action  is  subject  to  a  sum  constraint,  users’  utilities  are  im¬ 
pacted  by  additive  combinations  of  ak  and  /^'(a_n)  through  concave  functions  hkn. 
Therefore,  we  term  the  game  T  that  satisfies  assumptions  Al  and  A2  as  ACSCG. 
In  the  following  section,  we  present  several  illustrative  multi-user  interaction  ex- 
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amples  that  belong  to  ACSCG. 


2.2.2  Examples  of  ACSCG 

We  present  four  examples  that  satisfy  assumptions  Al  and  A2  and  belong  to 
ACSCG.  The  details  of  functions  /*{')  and  9n(’)  in  each  example  are  sum¬ 

marized  in  Table  2.1.  For  each  example,  Table  2.1  also  summarizes  the  applicable 
convergence  conditions  that  will  be  provided  in  the  remaining  parts  of  the  chap¬ 
ter. 

Example  2.1  We  first  consider  a  simple  two-user  game  with  two-dimension  ac¬ 
tion  spaces,  i.e.  N  =  K  =  2.  The  utility  functions  are  given  bifi 

Un( a)  =  -expj— a*  -  yj (ol J2  +  1  +  yj ( afn )2  +  l| 

—  exp|— +  yj (alj2  +  1  -  yj (al J2  +  l}, 

for  n  =  1,  2.  The  resource  constraints  are  Ylk=i  an  —  in  which  Mn  >  0  and 
a!f  >  0  for  Vn,  k. 

Example  2.2  (Power  control  in  frequency-selective  Gaussian  interference  chan¬ 
nel  [YGC02,  SPB08])  There  are  N  transmitter  and  receiver  pairs  in  the  system. 
The  entire  frequency  band  is  divided  into  K  frequency  bins.  In  frequency  bin 
k,  the  channel  gain  from  transmitter  i  to  receiver  j  is  denoted  as  Hf- ,  where 
k  =  1,2,  ,  K.  Similarly,  denote  the  noise  power  spectral  density  (PSD)  that 

receiver  n  experiences  as  and  player  n’s  transmit  PSD  as  Pf.  The  action  of 
user  n  is  to  select  its  transmit  power  P„  =  [P)  Pf  ■  ■  ■  Pfi]  subject  to  its  power 
constraint:  For  a  fixed  P„7  if  treating  its  interference  as  noise, 

3 In  this  example,  since  there  are  only  two  users,  the  subindex  —n  denotes  the  user  but  n. 
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user  n  can  achieve  the  following  data  rate: 


K 

rn  (P )  =  J]l0S2  (  1  + 
k=  1 
K 


TTK  TDK 

n  nn^-n 


rrk  4-  V  ffk  Pk 

wn  1  /  Jtp-f-y  rn  n  m 


N 


(2.3) 


E  (lo&(^  +  E  H~nPD  -  l0^n  +  E  <nPi) 

m^n 


k= 1 


m= 1 


Example  2.3  ^Delay  minimization  in  Jackson  Networks  [CY01],)  As  an  addi¬ 
tional  example,  we  consider  a  network  of  N  nodes.  A  Poisson  stream  of  external 
packets  arrive  at  node  n  with  rate  f>n  and  the  input  stream  is  split  into  K  traf¬ 
fic  classes,  which  are  individually  served  by  exponential  servers.  Denote  node 
n’s  input  rate  and  service  rate  for  class  k  as  ifk  and  g/j  respectively.  There¬ 
fore,  the  action  of  node  n  is  to  determine  the  rates  for  different  traffic  classes 
=  [-01  \  ■  ■  ■  iff/]  and  the  total  rate  is  subject  to  the  minimum  rate  constraint: 


i^n  —  VCm-  27ie  packets  of  the  same  traffic  class  constitute  a  Jackson  net¬ 
work  in  which  Markovian  routing  is  adopted:  packets  of  class  k  completing  service 
at  node  m  are  routed  to  node  n  with  probability  r^n  or  exit  the  network  with  prob¬ 
ability  rfn 0  =  1  —  Y^n= i  rmn-  Denote  the  arrival  rate  for  class  k  at  node  n  as  gk . 
By  Jackson’s  Theorem,  we  have  gk  =  f>k  +  Sm=i  rlmrmm  n  =  1 ,  2,  -  •  ■  ,  K.  De¬ 
note  [Rk}mn  =  rnm >  =  (/  —  Rk)~l ,  and  v^n  =  [Y k]nm-  Equivalently ,  we  have 

gk  =  Ylm=i  Ea,ch  node  aims  to  minimize  its  total  M/M/1  queueing  delay 


incurred  by  accommodating  its  traffic: 


K 


<U»)  =  E 


jk 

m 


(2.4) 


Example  2.3  can  be  shown  to  be  a  special  case  of  ACSCG  by  slightly  trans¬ 
forming  the  action  sets  and  utilities.  We  can  define  user  n’s  action  as  — \Pn.  For 
user  7i,  the  sum  constraint  becomes  J2k=i  ~'llJn  —  — VCm  and  minimizing  dn(^r) 
is  equivalent  to  maximizing  —  dn(At). 
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Example  2.4  (Asynchronous  transmission  in  digital  subscriber  lines  network 
[CHC07])  The  basic  setting  of  this  example  to  similar  as  that  of  Example  2.2 
except  that  inter-carrier  interference  (ICI)  exist  among  different  frequency  bins. 
Due  to  the  loss  of  the  orthogonality,  the  interference  that  user  n  experiences  in 
frequency  bin  k  is 

K 

m^n  j= 1 

in  which  7 (j)  is  the  ICI  coefficient  that  represents  the  relative  interference  trans¬ 
mitted  signal  in  a  particular  frequency  bin  generates  to  its  jth  neighbor  bin.  In 
particular,  it  takes  the  form 


7(i)  = 


1,  if  3  =  0 

2  -  _K  <  j  <  K  ?'7o 

>  2  —  J  —  2  J  / 


(2,6) 


»W(  ii) 


It  satisfies  the  symmetric  and  circular  properties,  i.e.  'y(-j)  =  7 (J)  =  7 (K  —  j). 
Usern’s  achievable  rate  in  the  presence  of  ICI  is  given  by 


K 


^(P)  =  J]l0g2 


k= 1 


1  + 


jerk  pk 
11nnr  n 


°n  +  (Ej.l  l(k  ~  j)H’mnPV). 


(2.7) 


2.2.3  Issues  Related  to  ACSCG 


Since  the  ACSCG  model  represents  a  good  abstraction  of  numerous  multi-user 
resource  allocation  problems,  we  aim  to  investigate  the  convergence  properties 
of  various  distributed  algorithms  in  ACSCG  without  and  with  real-time  message 
passing. 

ACSCG  is  a  concave  game  [FT91,  Ros65]  and  therefore,  it  admits  at  least  one 
pure  NE.  I11  practice,  we  want  to  provide  the  sufficient  conditions  under  which 
best  response  dynamics  provably  and  globally  converges  to  a  pure  NE.  However, 
the  existing  literature,  e.g.  the  diagonal  strict  concavity  (DSC)  conditions  in 
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[Ros65]  and  the  supermodular  game  theory  [Top98,  Yao95,  AA03],  does  not  pro¬ 
vide  such  convergence  conditions  for  the  general  ACSCG  model.  For  example, 
the  DSC  conditions  developed  for  general  concave  games  do  not  guarantee  the 
convergence  of  best  response  dynamics  [Ros65].  Even  if  the  utility  functions  in 
ACSCG  possess  the  supermodular  type  structure,  due  to  the  sum  constraint, 
the  action  set  of  each  user  is  generally  not  a  sublattice4  of  1ZK .  Therefore,  the 
convergence  results  based  on  supermodular  games  cannot  be  directly  applied  in 
ACSCG.  On  the  other  hand,  if  we  want  to  maximize  the  sum  utility  by  enabling 
real-time  message  passing  among  users,  we  also  note  that,  the  utility  un  is  not 
necessarily  jointly  concave  in  a  because  of  the  existence  of  g%(-).  Therefore,  the 
existing  algorithms  developed  for  the  convex  NUM  are  not  immediately  applica¬ 
ble  either. 

In  fact,  a  unique  feature  of  the  ACSCG  is  that  different  users’  actions  are 
additively  coupled  in  h^(-)  and  each  user’s  action  space  is  sum- constrained.  In  the 
following  sections,  we  will  fully  explore  these  specific  structures  and  address  the 
convergence  properties  of  various  distributed  algorithms  in  two  different  scenarios. 
Specifically,  Section  2.3  investigates  the  scenarios  in  which  each  user  n  can  only 
observe  {/*( a^.ri)}|t=1  and  cannot  exchange  any  information  with  any  other  user. 
Section  2.4  focuses  on  the  scenarios  in  which  each  user  n  is  able  to  announce  and 
receive  information  in  real-time  to  and  from  the  remaining  users  about  d\ tyf1  and 

4In  supermodular  games,  for  each  player,  the  action  set  is  a  nonempty  and  compact  sublattice 
of  1Z K .  We  can  verify  that  with  the  sum  constraint,  An  is  usually  not  a  sublattice  of  TZK  by 
taking  the  component-wise  maximum. 
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2.3  Scenario  I:  No  Message  Exchange  among  Users 

In  communication  scenarios  where  users  cannot  exchange  messages  to  achieve 
coordination,  the  participating  users  can  simply  choose  actions  to  selfishly  max¬ 
imize  their  individual  utility  functions  un( a)  without  taking  into  account  the 
utility  degradation  caused  to  the  other  users.  In  particular,  each  user  individu¬ 
ally  solves  the  following  optimization  program: 

max  un( a).  (2.8) 

eA  n 

The  steady  state  outcome  of  such  a  multi-user  interaction  is  usually  characterized 
as  a  NE,  at  which  given  the  other  users’  actions,  no  user  can  increase  its  utility 
alone  by  unilaterally  changing  its  action.  It  is  worth  pointing  out  that,  since 
there  is  no  coordination  signal  among  users,  NE  generally  does  not  lead  to  a 
Pareto-optimal  solution.  Section  IV  will  discuss  distributed  algorithms  in  which 
users  exchange  coordination  signals  in  order  to  improve  the  system  efficiency. 

2.3.1  Properties  of  Best  Response  Dynamics  in  ACSCG 

To  better  understand  the  key  properties  of  the  ACSCG,  in  this  subsection,  we 
first  focus  on  the  scenarios  in  which  a_n)  is  the  linear  combination  of  the 
remaining  users’  action  in  the  same  dimension  k,  i.e. 

fn( a-n)  =  V  P-9) 

m^n 

and  F^n  6  7 Z,  \/m,n7k.  Specifically,  both  Example  2.2  and  2.3  in  Table  2.1 
belong  to  this  category.  In  Section  2.3.2,  we  will  extend  the  results  derived  for 
the  functions  /*(a_n)  defined  in  (2.9)  to  general  /*( a_n). 

Since  h*(-)  is  concave,  the  objective  in  (2.8)  is  a  concave  function  in  a when 
the  other  users’  actions  a__n  are  fixed.  To  find  the  globally  optimal  solution  of 
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the  problem  in  (2.8),  we  can  first  form  its  Lagrangian 


K 


Ln{&m  A)  —  (a)  T  A (iWn  ^  ^ 


(2.10) 


fc=i 


m 


which  a kn  G  [a“™,  a™ )“].  By  taking  the  first  derivatives  of  (2.10),  we  have 

dLn{ an,  A)  _  <%n(«n  +  Em/n  Fmn«m) 


<9a£ 


A  =  0. 


Denote 


a-  k 

mnm 


m^n 


n  max 
Jjn,k 


(2.11) 


(2.12) 


in  which  1  is  the  inverse  function5  of  and  [x]£;  =  max{min{x,  a},  b}. 

The  optimal  solution  of  (2.8)  is  given  by  a*fc  =  Z^(a„n,  A*),  where  the  Lagrange 
multiplier  A*  is  chosen  to  satisfy  the  sum  constraint  J^k-i  anfc  =  Afn. 

We  dehne  the  best  response  operator  B^(-)  as 


fl£(a_B)  =  £(a_B,A*). 


(2.13) 


We  consider  the  dynamic  adjustment  process  in  which  users  revise  their  ac¬ 
tions  over  time  based  on  their  observations  about  their  opponents.  A  well-known 
candidate  for  such  adjustment  processes  is  the  so-called  best  response  dynamics. 
In  the  best  response  algorithm,  each  user  updates  its  action  using  the  best  re¬ 
sponse  strategy  that  maximizes  its  utility  function  in  (2.2).  We  consider  two  types 
of  update  orders,  including  sequential  update  and  parallel  update.  Specifically, 
in  sequential  update,  individual  players  iteratively  optimize  in  a  circular  fashion 
with  respect  to  their  own  actions  while  keeping  the  actions  of  their  opponents 
fixed.  Formally,  at  stage  t,  user  n  chooses  its  action  according  to 

*  =  Bn  ( [aU  •  •  •  i  an— 1 ,  an+\  >  •  •  •  ,  atf  *]  )  ■ ■  (2‘ 14) 

5If  $  x  =  x*  such  that  ^r|x=x*  =  A,  we  let  1(A)  =  — oo. 
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On  the  other  hand,  players  adopting  the  parallel  update  their  actions  revise  at 
stage  t  according  to 

<£*  =  Bjfal-1).  (2.15) 

We  obtain  several  sufficient  conditions  under  which  best  response  dynamics 
converges.  Similar  convergence  conditions  are  proved  in  [CSK03,  CHC07,  SPB08] 
for  Example  2.2  in  which  h*(x)  =  log 2(cx^  +  H*nx).  We  consider  more  general 
functions  h*(-)  and  further  extend  the  convergence  conditions  in  [CSK03,  CHC07, 
SPB08].  The  key  differences  among  all  the  sufficient  conditions  which  will  be 
provided  in  this  section  are  summarized  in  Table  2.2. 


2. 3. 1.1  General  h*(-) 


The  first  sufficient  condition  is  developed  for  the  general  cases  in  which  the  func¬ 
tions  fik(-)  in  the  utilities  un(-)  are  specified  in  assumption  A2.  Define 


max/;  \F^n\,  if  m  ^  n 
0,  otherwise. 

and  let  p(Tmax)  denote  the  spectral  radius  of  the  matrix  T1 


(2.16) 


Theorem  2.1  If 

p( Tm“)  <  (Cl) 

then  there  exists  a  unique  NE  in  game  V  and  best  response  dynamics  converges 
linearly  to  the  NE,  for  any  set  of  initial  conditions  belonging  to  A  with  either 
sequential  or  parallel  updates. 


Proof :  This  theorem  is  proved  by  showing  that  the  best  response  dynamics 
defined  in  (2.14)  and  (2.15)  is  a  contraction  mapping  under  (Cl).  See  Appendix 
A  for  details.  ■ 
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In  multi-user  communication  applications,  it  is  common  to  have  games  of 
strategic  complements  (or  strategic  substitutes) ,  i.e.  the  marginal  returns  to  any 
one  component  of  the  player’s  action  rise  with  increases  (or  decreases)  in  the 
components  of  the  competitors’  actions  [BGK85].  For  instance,  in  Examples 
2.2  and  2.4,  increasing  user  n’s  transmitted  power  creates  stronger  interference 
to  the  other  users  and  decreases  their  marginal  achievable  rates.  Similarly,  in 
Example  2.3,  increasing  node  n’s  input  traffic  rate  congests  all  the  servers  in 
the  network  and  increases  the  marginal  queueing  delay.  Mathematically,  if  un  is 
twice  differentiable,  strategic  complementarities  (or  strategic  substitutes)  can  be 
described  as 

d  Un(a-n,  a-n)  ^  n  w  -J-  'If  ^  un(am  a-n)  ,  n  w  ,  ■  >  ,  /0  i  7\ 

>  0,  Vm  A  n,j,  k,  (or  - - <  0,  Vm  A  n,j,  k).  (2.17) 


daidat 


daidat 


We  can  verify  that  Examples  2.2,  2.3,  and  2.4  are  games  with  strategic  substitutes. 
For  the  ACSCG  models  that  exhibit  strategic  complementarities  (or  strategic 
substitutes),  the  following  theorem  further  relaxes  condition  (Cl). 


Theorem  2.2  Let  T  be  an  ACSCG  with  strategic  complementarities  (or  strategic 
substitutes),  i.e.  F(fn  <  0,  \/k,m  ^  n,  (or  F^n  >  0,  \/k,m  ^  n).  If 

p{T max)  <  1,  (C2) 

then  there  exists  a  unique  NE  in  game  V  and  best  response  dynamics  converges 
linearly  to  the  NE,  for  any  set  of  initial  conditions  belonging  to  A  with  either 
sequential  or  parallel  updates. 


Proof:  This  theorem  is  proved  by  adapting  the  proof  of  Theorem  2.1.  See 
Appendix  B.  ■ 

Remark  2.1  (Implications  of  conditions  (Cl)  and  (C2))  Theorem  2.1  and  The¬ 
orem  2.2  give  sufficient  conditions  for  best  response  dynamics  to  globally  converge 
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to  a  unique  fixed  point.  Specifically,  rnax^  Ffm  can  be  regarded  as  a  measure  of 
the  strength  of  the  mutual  coupling  between  user  m  and  n.  The  intuition  be¬ 
hind  (Cl)  and  (C2)  is  that,  the  weaker  the  coupling  among  different  users  is,  the 
more  likely  that  best  response  dynamics  converges.  Consider  the  extreme  case  in 
which  F(fn  =  0 ,Mk,m  n.  Since  each  user’s  best  response  is  not  impacted  by 
the  remaining  users  ’  action  a_n;  the  convergence  is  immediately  achieved  after  a 
single  best-response  iteration.  If  no  restriction  is  imposed  on  Ffm,  Theorem  2.1 
specifies  a  mutual  coupling  threshold  under  which  best  response  dynamics  prov- 
ably  converge.  The  proof  of  Theorem  2.1  can  be  intuitively  interpreted  as  follows. 
We  regard  every  best  response  update  as  the  users  ’  joint  attempt  to  approach  the 
NE.  Due  to  the  linear  coupling  structure  in  (2.9),  usern’s  best  response  in  (2.12) 
contains  a  term  ^mnam  that  is  a  linear  combination  of  a_n.  As  a  result, 

the  residual  error  |a^+1  —  a^|  ;  which  is  the  1-norm  distance  between  the  updated 
action  profile  aff1  and  the  current  action  profile  afn,  can  be  upper-bounded  using 
linear  combinations  of  |a^  — a^1^  in  which  m  n.  Recall  that  Ffin  can  be  either 
positive  or  negative.  We  also  note  that,  if  a^  7^  afij1 ,  afm  —  afij1  contains  both  pos¬ 
itive  and  negative  terms  due  to  the  sum- constraint.  In  the  worst  case,  the  distance 
—  afn |  (  is  maximized  if  {-F^n}  an d  {am  ~  am~ 1}  are  co-phase  multiplied 
and  additively  summed,  i.e.  Ffnn (akr’(  —  a^_1)  >  0,  for  \/k  =  1, . . .  ,K,m  n. 
After  an  iteration,  all  users  except  n  contributes  to  user  n ’s  residual  error  at  stage 
t  +  1  up  to  |  -h’m.n  \  \  am  ~  arrT 1 1 1  •  Under  condition  (Cl),  it  is  guar¬ 

anteed  that  the  residual  error  contracts  with  respect  to  the  special  norm  defined 
in  (2.63).  Theorem  2.2  focuses  on  the  situations  in  which  the  signs  of  F(fn  are 
the  same,  Mm  n,k.  In  this  case,  { Frknn }  and  cannot  be  co-phase 

multiplied.  Therefore,  the  region  of  convergence  enlarges  and  hence,  condition 
(C2)  stated  in  Theorem  2.2  is  weaker  than  condition  (Cl)  in  Theorem  2.1. 
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Remark  2.2  (Relation  to  the  results  in  references  [CSK03,  CHC07,  SPB08]) 
Similar  to  [CSK03,  CHC07],  our  proofs  choose  1-norm  as  the  distance  measure 
for  the  residual  errors  aff1  —  af  after  each  best-response  iteration.  However,  by 
manipulating  the  inequalities  in  a  different  way,  condition  (G2)  is  more  general 
than  the  results  in  [CSK03,  CHC07],  where  they  require  max/;  Ffn  <  -Wy.  In¬ 
terestingly,  condition  (C2)  recovers  the  result  obtained  in  [SPB08]  where  it  is 
proved  by  choosing  the  Euclidean  norm  as  the  distance  measure  for  the  residual 
errors  a^+1  —  atn  after  each  best-response  iteration.  However,  the  approach  in 
[SPB08]  using  the  Euclidean  norm  only  applies  to  the  scenarios  in  which  is 
a  logarithmic  function.  We  prove  that  condition  (C2)  applies  to  any  h*(-)  that  is 
increasing  and  strictly  concave. 


2.3. 1.2  A  special  class  of  /&*(•) 


In  addition  to  conditions  (Cl)  and  (C2),  we  also  develop  a  sufficient  convergence 
condition  for  a  family  of  utility  functions  parameterized  by  a  negative  number  9. 
In  particular,  h*(-)  satisfies6 


log(a£  +  Fnnx), 

(a.kn+Ffnx)e+1 
0+1  ’ 


if  9  =  -1, 

if  —  1  <  9  <  0  or  9  <  — 1. 


(2.18) 


and  af  £  1Z  and  Ffn  >  0.  The  interpretation  of  this  type  of  utilities  has  been 
addressed  in  [MW00].  It  is  shown  that  varying  the  parameter  9  leads  to  different 
types  of  fairness  across  ak  +  F(in(akn  +  for  all  k.  In  particular, 

9  =  —  1  corresponds  to  the  proportional  fairness;  if  9  =  —2,  then  harmonic  mean 
fairness;  and  if  9  =  —  oo,  then  max-min  fairness.  We  can  see  that,  Examples  2.2 
and  2.3  are  special  cases  of  this  type  of  utility  functions.  In  these  cases,  best 

6If  +  Ffnx  <  0,  we  let  hk(x)  =  — oo.  We  assume  for  this  class  of  hk(-)  that  for  Va_n  £ 
A-n,  there  exists  a„  £  An  such  that  af  +  Ffnx  >  0  for  Vn,  k. 


response  dynamics  in  equation  (2.12)  is  reduced  to 


^n(a-n,  A)  = 


(— ) 

V  pk  ) 

nn 


1+1  \  1 
S \g 


an 

Fk 

nn 


Define 


Ek=l 

[SmaX]mn  =  {  Ef=1(^„)1+® 


1+i  J  IP 

r  maxfcl  |  Fmn  1 1 

'  ^  I  \  x  mm 


m^n 

1+1 


fc  afc 

mnm 


n,k 


0, 


,  if  m  7^  n 
otherwise. 


(2.19) 


(2.20) 


For  the  class  of  utility  functions  in  (2.18),  Theorem  2.3  gives  a  sufficient  condition 
that  guarantees  the  convergence  of  the  best  response  dynamics  defined  in  (2.19). 


Theorem  2.3  For  hk(-)  defined  in  (2.18),  if 

p{ Smax)  <  1,  (C3) 

then  there  exists  a  unique  NE  in  game  T  and  best  response  dynamics  converges 
linearly  to  the  NE,  for  any  set  of  initial  conditions  belonging  to  A  and  with  either 
sequential  or  parallel  updates. 


Proof :  It  can  be  proved  by  showing  that  the  best  response  dynamics  defined 
in  (2.19)  is  a  contraction  mapping  with  respect  to  the  weighted  Euclidean  norm. 
See  Appendix  C  for  details.  ■ 


Remark  2.3  (Relation  between  conditions  (C3)  and  the  results  in  reference  / SPB08 ']) 
For  aforementioned  Example  2.2,  Scutari  et  al.  established  in  [SPB08]  a  sufficient 
condition  under  which  the  iterative  water-filling  algorithm  converges.  The  itera¬ 
tive  water-filling  algorithm  essentially  belongs  to  best  response  dynamics.  Specif¬ 
ically,  in  [SPB08],  Shannon’s  formula  leads  to  6  =  —  1  and  cross  channel  coeffi¬ 
cients  satisfy  F^n  >  0,  \/k,m  n.  Equation  (2.19)  reduces  to  the  water-filling 
formula 


A) 


I 

.A 


<*n 

pk 


V  Fk  ak 

/  j  mn  rr, 


max 

n,k 

.  J 
min 
n,k 


(2.21) 
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and  [Smax]mn  =  rriaxfc  Ffnn.  By  choosing  the  weighted  Euclidean  norm  as  the  dis¬ 
tance  measure  for  the  residual  errors  aff~l  —  sf  after  each  best-response  iteration, 
Theorem  2.3  generalizes  the  results  in  [SPB08]  for  the  family  of  utility  functions 
defined  in  (2.18). 


Remark  2.4  (Relation  between  conditions  (Cl),  (C2)  and  ( C3 ))  The  connec¬ 
tions  and  differences  between  conditions  (Cl),  (C2)  and  (C3)  are  summarized  in 
Table  2.2.  We  have  addressed  the  implications  of  (Cl)  and  (C2)  in  Remark  2.1. 
Now  we  discuss  their  relation  with  (C3).  First  of  all,  condition  (Cl)  is  proposed 
for  general  h k(-)  and  condition  (C3)  is  proposed  for  the  class  of  utility  functions 
defined  in  (2.18).  However,  Theorem  2.1  and  Theorem  2.3  individually  establish 
the  fact  that  best  response  dynamics  is  a  contraction  map  by  selecting  different 
vector  and  matrix  norms.  Therefore,  in  general,  (Cl)  and  (C3)  do  not  imme¬ 
diately  imply  each  other.  Note  that  [Smax]mn  <  (mn  ■  rnaxfc  ( Ffn \  in  which  (mn 
satifies 


( Fk  ) 

L^jk— 1  V  mm) 


C mn 


1+i 


(Fk  ) 

max 

k  ( fk  ' 
\  mm/ 


k  U+l 


1, 


ma  xk(FkJ  F(fm)1+^ 
mm  k{FkJ  F^m)l+1e 


(2.22) 


V  (Fk  V 

The  physical  interpretation  of  (mn  is  the  similarity  between  the  preferences  of  user 
m  and  n  across  the  total  K  dimensions  of  their  action  spaces.  Recall  that  both 
Smax  and  Tmax  are  non-negative  matrices  and  Smax  is  element-wise  less  than  or 
equal  to  maxm^n  CmnTmax.  By  the  property  of  non-negative  matrix  and  condition 
(Cl),  we  can  conclude  p(Smax)  <  p(maxm^n  <CmnTmax)  <  maxm/n  .  The  rela¬ 
tion  between  (Cl)  and  (C3)  is  pictorially  illustrated  in  Fig.  2.1.  Specifically,  if 
users  have  similar  preference  in  their  available  actions  and  the  upper  bound  of  (mn 
that  measures  the  difference  of  their  preferences  is  below  the  following  threshold: 


ma Xk,mMFL/ Fnm)1+1o 

1  <  z, 


min  k,mMFL/Fmm)l+e 


(2.23) 
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Figure  2.1:  Relation  between  (Cl)  and  (C3). 

we  know  that  (Cl)  implies  (C3)  in  this  situation  because  p(Smax)  <  maxm>n  (mn  • 
p(Tinax)  <2-^  =  1.  We  also  would  like  to  point  out  that,  the  LHS  of  (2.23) 
is  a  function  of  9  and  the  LHS  =  1  if  6  =  —1.  When  6  =  —  1,  Tmax  coincides 
with  Smax.  Mathematically,  in  this  case,  (C3)  is  actually  more  general  than  (C2), 
because  it  still  holds  even  if  coefficients  F(fn  have  different  signs. 


2.3.2  Extensions  to  General  fk(-) 


As  a  matter  of  fact,  the  results  above  can  be  extended  to  the  more  general 
situations  in  which  /*(■)  is  a  nonlinear  differentiable  function,  Vn,  k  and  its  input 
a_n  consists  of  the  remaining  users’  action  from  all  the  dimensions.  Accordingly, 
equation  (2.12)  becomes 


-l 


(A)  -  fkn{ a_n) 


a 


max 

n,k 


nmin 

an,k 


(2.24) 


The  conclusions  in  Theorem  2.1,  2.2,  and  2.3  can  be  further  extended  as  Theorem 
2.4,  and  2.5,  2.6  that  are  listed  below.  We  only  provide  the  proof  of  Theorem  2.4 
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in  Appendix  D.  The  detailed  proofs  of  Theorem  2.5  and  2.6  are  omitted  because 
they  can  be  proven  similarly  as  Theorem  2.4. 


For  general  /*(■),  we  denote 


J  max^yu'  YJL\ 

9fnia-n) 

SAL 

\ 

Besides,  for  h^(-)  defined  in  (2.18),  we  define 


Ef=1(BtJ1+g 
[Smax]mn  =  {  E2Li(*S»)1+* 


maxaeAfc/ 


yl^k= l 

0, 


,  if  m  7^  n 
otherwise. 


dfn(a--n) 

da* ' 


1+5 


(2.25) 


if  m  n 

otherwise. 

(2.26) 


Theorem  2.4  If 


p( Tmax)  < 


1 

2’ 


(C4) 


then  there  exists  a  unique  NE  in  game  V  and  best  response  dynamics  converges 
linearly  to  the  NE,  for  any  set  of  initial  conditions  belonging  to  A  with  either 


sequential  or  parallel  updates. 


Proof :  This  theorem  can  be  proved  by  combining  the  proof  of  Theorem  2.1 
and  the  mean  value  theorem  for  vector-valued  functions.  See  Appendix  D  for 
details.  ■ 

Similarly  as  in  Theorem  2.2,  for  the  general  ACSCG  models  that  exhibit 
strategic  complementarities  (or  strategic  substitutes),  we  can  further  relax  con¬ 
dition  (C4). 

Theorem  2.5  For  T  with  strategic  complementarities  (or  strategic  substitutes), 
i.e.  d'f,Q^f  n'>  >  0,Vm  ^  n,k,k',&  G  A,  (or  <  0,Vm  ^  n,k,k',&  G  A),  if 

p( fmax)  <  1,  (C5) 


32 


then  there  exists  a  unique  NE  in  game  V  and  best  response  dynamics  converges 
linearly  to  the  NE,  for  any  set  of  initial  conditions  belonging  to  A  with  either 
sequential  or  parallel  updates. 

Theorem  2.6  For  h^(-)  defined  in  (2.18),  if 

p{ Smax)  <  1,  (C6) 

then  there  exists  a  unique  NE  in  game  T  and  best  response  dynamics  converges 
linearly  to  the  NE,  for  any  set  of  initial  conditions  belonging  to  A  with  either 
sequential  or  parallel  updates. 


Remark  2.5  (Implications  of  conditions  ( Cf ),  (C5),  and  (C6))  Based  on  the 
mean  value  theorem,  we  know  that  the  upper  bound  of  the  additive  sum  of  first 
derivatives 


TK 

l^k= 1 


d/n(a-n) 

da& 


governs  the  maximum  impact  that  user  m ’s  action  can  make  over 


user  n’s  utility.  As  a  result,  Theorem  2.f,  Theorem  2.5,  and  Theorem  2.6  in¬ 


dicate  that  Y^Li 


dfS(  a-n) 

da's! 


can  be  used  to  develop  similar  sufficient  conditions 
for  the  global  convergence  of  best  response  dynamics.  Table  2.2  summarizes  the 
connections  and  differences  among  all  the  aforementioned  conditions  from  (Cl) 
to  (C6).  We  can  verify  that,  for  the  linear  function  ff(-)  that  is  defined  in  (2.9) 
and  studied  in  Section  2.3.1,  Va  e  A,  m  ^  n,  it  satisfies 


d/n(a-«) 

daff 


FL,  if  k'  —  k 

0,  otherwise. 


(2.27) 


In  addition,  we  can  see  that,  in  Example  2.4,  fh(-)  is  actually  an  affine  function 
with 


9fkn(  P-n) 

dP% 


7 (k-k')H?;n,  if  k'  =  k 


0,  otherwise. 


(2.28) 
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and  Smax  is  reduced  to 


maxfc'Ef=i7  (k-k')H*n,  ifm^n 

0,  otherwise. 

As  an  immediate  result  of  Theorem  2.6,  we  have  the  following  corollary  which 
specifies  a  sufficient  condition  that  guarantees  the  convergence  of  the  iterative 
water-filling  algorithm  for  asynchronous  transmissions  in  multi-carrier  systems 
[CHC07]. 

Corollary  2.1  In  Example  2.4,  if  the  matrix  Smax  defined  in  (2.29)  satisfies 

p( Smax)  <  1,  (2.30) 

then  there  exists  a  unique  NE  in  game  V  and  the  iterative  water-filling  algorithm 
converges  linearly  to  the  NE,  for  any  set  of  initial  conditions  belonging  to  A  and 
with  either  sequential  or  parallel  updates. 

Remark  2.6  (Impact  of  sum  constraints)  An  interesting  phenomenon  that  can  be 
observed  from  the  analysis  above  is  that,  the  convergence  condition  may  depend  on 
the  maximum  constraints  {Mn}f=1.  This  differs  from  the  observation  in  [SPB08] 
that  the  presence  of  the  transmit  power  and  spectral  mask  constraints  does  not 
affect  the  convergence  capability  of  the  iterative  water-filling  algorithm.  This  is 
because  when  functions  /*( a_n)  are  affine,  e.g.  in  Example  2.2,  2.3,  and  2.4,  the 
elements  in  Tmax  and  Smax  are  independent  of  the  values  of  {Mn}f=1.  Therefore, 
(C1)-(C6)  are  independent  of  Mn  for  affine  ff( a_n).  However,  for  non-linear 
ff(  a_n)7  the  values  of  {Mn}^ r=i  specify  the  range  of  users’  joint  feasible  action 
set  A,  and  this  will  affect  Tmax  and  Smax  accordingly.  In  other  words,  in  the 
presence  of  non-linearly  coupled  f(fi a_n),  convergence  may  depend  on  the  players’ 
maximum  sum  constraints  {Mn}f=1. 
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2.3.3  Connections  to  the  Results  of  Rosen  and  Gabay 


In  [Ros65] ,  Rosen  proposed  a  continuous-time  gradient  projection  based  iterative 
algorithm  to  obtain  a  pure  NE  under  the  assumption  of  DSC  conditions.  Here 
we  present  a  discrete  version  of  the  algorithm  in  [Ros65] ,  named  “ gradient  play ” . 
Specifically,  at  stage  t ,  each  user  first  determines  the  gradient  of  its  own  util¬ 
ity  function  un(anj  a/l).  Then  each  user  updates  its  action  using  gradient 
projection  according  to 


ay  =  a: 'If-1  +  «, 


dur 


(2.31) 


and 

2 

i 

where  Kn  is  the  stepsize  and  [v]^2  denotes  the  projection  of  the  vector  v  onto 
user  n’s  action  set  An  with  respect  to  the  Euclidean  norm  ||  •  |[2.  If  Kn  is  chosen 
to  be  sufficiently  small,  gradient  play  approximates  the  continuous-time  gradient 
projection  algorithm.  For  each  nonnegative  vector  k  =  [fiy  . . .  %],  define 


(2.32) 


al  =  la1/ a2/  ■  ■  ■  a/*]  = 


a'/ta'/t  ■  ■  ■  a/'1 


#(a, k)  —  [/ciViMi(a)  K2V2u2(a)  ...  knV NuN(a)]T .  (2.33) 


The  definition  of  DSC  in  [Ros65]  is  that,  for  fixed  k  >  0  and  every  a0,  a1  e  A , 
we  have 

(a1  —  a°)Tg(a°,K.)  +  (a0  —  a1)Tg(a[,n)  >  0.  (2.34) 

A  sufficient  condition  for  DSC  is  that  the  symmetric  matrix  G(a,  k)  +  GT( a,  k) 
be  negative  definite  for  a  e  A,  where  G(a,  k)  is  the  Jacobian  with  respect  to  a 
of  g(a,K). 

However,  when  using  gradient  play  to  search  for  a  pure  NE,  the  stepsize 
Kn  needs  to  be  carefully  chosen  and  set  to  be  sufficiently  small,  which  usually 
slows  down  the  rate  of  convergence.  As  an  alternative  distributed  algorithm,  for 
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Table  2.3:  A  summary  of  various  convergence  conditions  in  concave  games. 


Algorithms 

Sufficient  conditions  and  the  applicable  games 

Gradient  play 

Rosen’s  DSC  conditions  for  concave  games  [Ros65] 

Best  response 

Gabay’s  dominance  solvability  condition  for  concave  games 

with  An  =  TZ+  [GM80],  conditions  (C1)-(C6)  for  ACSCG 

concave  games  with  An  =  1Z+,  Vn  e  A f,  Gabay  and  Moulin  provided  in  [GM80] 
a  dominance  solvability  condition  under  which  best  response  dynamics  globally 
converges  to  a  unique  NE.  Specifically,  the  dominance  solvability  condition  is 
given  by 


d2uri 

d2ar 


> 


E 

m^n 


d2ur 


dttndttr 


(2.35) 


The  sufficient  conditions  provided  in  this  section  and  Gabay’s  dominance  solv¬ 
ability  condition  specify  the  convergence  conditions  of  best  response  dynamics 
in  different  subclasses  of  concave  games.  Specifically,  our  results  are  developed 
for  concave  games  in  which  every  user  has  a  multi-dimensional  action  space  sub¬ 
ject  to  a  single  sum-constraint  and  Gabay’s  dominance  solvability  condition  is 
proposed  for  concave  games  with  single  dimensional  strategy. 


2.4  Scenario  II:  Message  Exchange  among  Users 


In  this  section,  our  objective  is  to  coordinate  the  users’  actions  in  ACSCG  to 
maximize  the  overall  performance  of  the  system,  measured  in  terms  of  their  total 
utilities,  in  a  distributed  fashion.  Specifically,  the  optimization  problem  we  want 
to  solve  is 


N 

max  tin  (a). 

71=  1 


(2.36) 
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We  will  study  two  distributed  algorithms  in  which  the  participating  users  ex¬ 
change  price  signals  that  indicate  the  “cost”  or  “benefit”  that  its  action  causes 
to  the  other  users.  Allocating  network  resources  via  pricing  has  been  well- 
investigated  for  convex  NUM  problems  [CLC07],  where  the  original  NUM  prob¬ 
lem  can  be  decomposed  into  distributedly  solvable  subproblems  by  setting  price 
for  each  constraint  resource,  and  each  subproblem  has  to  decide  the  amount  of 
resources  to  be  used  depending  on  the  charged  price.  However,  unlike  in  the 
conventional  convex  NUM,  pricing  mechanisms  may  not  be  immediately  appli¬ 
cable  in  ACSCG  if  the  objective  in  (2.36)  is  not  jointly  concave  in  a.  Therefore, 
we  are  interested  in  characterizing  the  convergence  condition  of  different  pricing 
algorithms  in  ACSCG. 

We  know  that  for  any  local  maximum  a*  of  problem  (2.36),  there  exist  La¬ 
grange  multipliers  An,  id  ,  •  ■  ■  and  i/1,  •  •  •  ,  v'^  such  that  the  following  Karush- 
Kuhn- Tucker  (KKT)  conditions  hold  for  all  n  G  A f: 


dun(  a*) 
dakn 

i  v  dUm^  -  \  , 

+  2^  d<  +  ^  ^ 

,  Vn 

(2.37) 

K 

A-(£ 

ak*  -  Mn)  =  0,  An  >  0 

(2.38) 

k= 1 

«  - 

-  =  0,  v'nk(a™  -  ak*)  =  0, 

IV 

O 

(2.39) 

Denote  7r^n  user  m’s  marginal  fluctuation  in  utility  per  unit  decrease  in  user  ra’s 
action  ajj  within  the  kth  dimension 

k  (  k  \  _ _ dum(a)  .  . 

m)  Qdk  ’ 

which  is  announced  by  user  m  to  user  n  and  can  be  viewed  as  the  cost  charged 
(or  compensation  paid)  to  user  n  for  changing  user  m’s  utility.  Using  (2.40), 
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(2.41) 


equation  (2.37)  can  be  rewritten  as 
dun( 


da k 


-E 

m^n 


TTk  (nk*  afc*  i 

mn \  m  ?  —m) 


—  An  +  V„  —  V, 


'k 


If  we  assume  fixed  prices  and  action  profile  a^n,  condition  (2.41)  gives  the 

necessary  and  sufficient  KKT  condition  of  the  following  problem: 

K 


max 

an£«4.n 


un(a)-J2a n-  (  J2  • 

k=  1  m^n 


(2.42) 


At  an  optimum,  a  user  behaves  as  if  it  maximizes  the  differences  between  its 
utility  minus  its  payment  to  the  other  users  in  the  network  due  to  its  impact  over 
the  other  users’  utilities.  Different  distributed  pricing  mechanisms  can  be  devel¬ 
oped  based  on  the  individual  objective  function  in  (2.42)  and  the  convergence 
conditions  may  also  vary  based  on  the  specific  action  update  equation. 


When  optimization  program  (2.36)  is  not  convex,  the  pricing  algorithms  de¬ 
veloped  for  convex  NUM,  e.g.  gradient  and  subgradient  algorithms,  cannot  be 
directly  applied.  In  the  next  two  subsections,  we  will  investigate  two  distributed 
pricing  mechanisms  for  non-convex  ACSCG  and  provide  two  sufficient  conditions 
that  guarantee  their  convergence.  Specifically,  under  these  sufficient  conditions, 
both  algorithms  guarantee  that  the  total  utility  is  monotonically  increasing  un¬ 
til  it  converges  to  a  feasible  operating  point  that  satisfies  the  KKT  conditions. 
Similarly  as  in  Section  2.3.1,  we  first  assume  fk( a_n)  takes  the  form  in  (2.9)  and 
users  update  their  actions  in  parallel. 


2.4.1  Gradient  Play 

The  first  distributed  pricing  algorithm  that  we  consider  is  gradient  play.  The 
update  iterations  of  gradient  play  need  to  be  properly  redefined  in  presence  of 
real-time  information  exchange.  Specifically,  at  stage  t ,  users  adopting  this  al¬ 
gorithm  exchange  price  signals  using  the  gradient  information  at  stage 
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t  —  1.  Within  each  iteration,  each  user  first  determines  the  gradient  of  the  ob¬ 
jective  in  (2.42)  based  on  the  price  vectors  and  its  own  utility  function 

un(an,  eL~n).  Then  each  user  updates  its  action  a „  using  gradient  projection 
algorithm  according  to 


ay  =  ak/-1  +  K 


dun( aw,  a_n ) 
dab 


-E 


7 r, 


k,t— 1 


m^n 


(2.43) 


and 

•An 

in  which  the  stepsize  k  >  0.  The  following  theorem  provides  a  sufficient  condition 
under  which  gradient  play  will  converge  monotonically  provided  that  we  choose 
small  enough  constant  stepsize  k. 


(2.44) 


ai  =  [  a1*  a2/ 


a5’*  1  = 


'1  /  '9  t 

ad^eC’ 


Theorem  2.7  If  Vrt,  k,  x,  y  6  A_n; 


infs!MM>_00i  an(i 


s  <92a: 

gradient  play  converges  for  a  small  enough  stepsize  n 


V0n(x)-V0*(  y)  <^||x-y||,  (C7) 


Proof :  This  theorem  can  be  proved  by  showing  the  gradient  of  the  objective 
function  in  (2.36)  is  Lipschitz  continuous  and  applying  Proposition  3.4  in  [BT97]. 
See  Appendix  E  for  details.  ■ 

Remark  2.7  (Application  of  condition  (Cl))  A  sufficient  condition  that  guar¬ 
antees  the  convergence  of  distributed  gradient  projection  algorithm  is  the  Lips¬ 
chitz  continuity  of  the  gradient  of  the  objective  function  in  (2.36).  For  exam¬ 
ple,  in  the  power  control  problem  in  multi-channel  networks  [HBH06],  we  have 
hkn (x)  =  log 2(o£  +  H*nx)  and  gk( P_n)  =  log2(cx£  +  HtnPm)-  For  this  con¬ 

figuration,  we  can  immediately  verify  that  condition  (Cl)  is  satisfied.  Therefore, 
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gradient  play  can  be  applied.  Moreover,  as  in  [HBH06],  if  we  can  further  ensure 
that  the  problem  in  (2.36)  is  convex  for  some  particular  utility  functions,  gradient 
play  converges  to  the  unique  optimal  solution  of  (2.36)  at  which  achieving  KKT 
conditions  implies  global  optimality. 


2.4.2  Jacobi  Update 

We  consider  another  alternative  strategy  update  mechanism  called  Jacobi  update 
[LA02].  In  Jacobi  update,  every  user  adjusts  its  action  gradually  towards  the 
best  response  strategy.  Specifically,  the  maximizer  of  problem  (2.42)  takes  the 
following  form 

B'nk( a-„)  =  +  <£  ~  P  +  E  4™,)  -  E  (2.45) 

m^n  m^n 

in  which  An,  iA,  and  v'f)  are  the  Lagrange  multipliers  that  satisfy  complementary 
slackness  in  (2.38)  and  (2.39),  and  7r^n  is  defined  in  (2.40).  In  Jacobi  update,  at 
stage  t,  user  n  chooses  its  action  according  to 

=  “U1  +  K K(a'y)  -  a*’1-1] ,  (2.46) 

in  which  the  stepsize  k  G  (0,1].  The  following  theorem  establishes  a  sufficient 
convergence  condition  for  Jacobi  update. 


Theorem  2.8  If  Vrt,  k,  x,  y  G  A-n, 


mf^M>-0o,  sup  dfM  <  o,  and 


x  d2x  x  d2x 

Jacobi  update  converges  if  the  stepsize  k  is  sufficiently  small. 


V  9kn(x) -V9n(y)  <  -  y II, 


(C8) 


Proof :  This  can  be  proved  using  the  descent  lemma  and  the  mean  value 
theorem.  The  details  of  the  proof  are  provided  in  Appendix  F.  ■ 
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Remark  2.8  (Relation  between  condition  (C8)  and  the  result  in  [SBH08])  Shi 
et  al.  considered  the  power  allocation  for  multi-carrier  wireless  networks  with 
non-separable  utilities.  Specifically,  un(-)  takes  the  form 


K 


ur 


(p)  =  ri[  5>g2  h  + 


k= 1 


rjr/c  tdk 


+  E 


I Jk  pk 
m^=n  11mn1  m  ' 


(2.47) 


in  which  r\  ( • )  is  an  increasing  and  strictly  concave  function.  Since  the  utilities 
are  non-separable,  the  distributed  pricing  algorithm,  proposed  in  [SBH08],  which 
in  fact  belongs  to  Jacobi  update,  requires  only  one  user  to  update  its  action  profile 
at  each  stage  while  keeping  the  remaining  users’  action  fixed.  The  condition  in 
(C8)  gives  the  convergence  condition  of  the  same  algorithm  in  ACSCG.  We  prove 
in  Theorem  2.1  that,  if  the  utilities  are  separable,  convergence  can  still  be  achieved 
even  if  these  users  update  their  actions  at  the  same  time.  Therefore,  we  do  not 
need  an  arbitrator  to  select  the  single  user  that  updates  its  action  at  each  stage. 


Remark  2.9  (Complexity  of  signaling)  The  complexity  of  message  exchange  mea¬ 
sured  in  terms  of  the  number  of  price  signals  to  update  in  (2.4-0)  is  generally 
of  the  order  of  0(KN2).  It  is  worth  mentioning  that  the  amount  of  signaling 
can  be  further  reduced  to  O(KN)  in  the  scenarios  where  <^(-)  are  functions  of 
Fmnam ■  I n  this  case,  each  user  only  needs  to  announce  one  price  signal  i 
for  each  dimension  of  its  action  space: 

Kn\am  a-n)  ~  pk  nk\  (2.48) 

Consequently,  7r^n  can  be  determined  based  on  7r^n  =  F^m/n1fn,  which  greatly  re¬ 
duces  the  overhead  of  signaling  requirement.  It  is  straightforward  to  check  that 
only  0(KN )  messages  need  to  be  generated  and  exchanged  per  iteration  in  both 
utility  functions  (2.3)  and  (2-4). 
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Remark  2.10  (Extension  to  general  cases)  As  a  matter  of  fact,  conditions  (Cl) 
and  (C8)  apply  to  a  broader  class  of  multi-user  interaction  scenarios,  including 
the  general  model  defined  in  (2.2).  Specifically,  as  addressed  in  Remark  2.7,  the 
Lipschitz  continuity  of  the  gradient  ofJ2^=1un(a)  is  sufficient  to  guarantee  that 
gradient  play  with  a  small  enough  stepsize  achieves  an  operating  point  at  which 
KKT  conditions  are  satisfied.  In  addition,  we  can  use  the  same  technique  in 
Appendix  F  to  show  the  convergence  of  Jacobi  update  given  that  supx  9  <  0, 

Vn,k,  and  the  gradient  of  X)^=iMn(a)  is  Lipschitz  continuous. 


2.5  Numerical  Examples 


In  Section  2.2.2,  we  present  several  illustrative  examples  of  ACSCG.  This  sec¬ 
tion  uses  Examples  2.1  and  2.3  to  illustrate  the  various  distributed  algorithms 
discussed  in  the  chapter. 


We  start  with  Example  2.1  to  verify  the  proposed  convergence  conditions  of 
best  response  dynamics.  Even  though  it  is  a  simple  two-user  game  with  An  C  7Z2, 
existing  results  in  the  literature  cannot  immediately  determine  whether  or  not 
the  best  response  dynamics  in  this  simple  game  can  globally  converge  to  a  NE. 
Specifically,  in  Example  2.1,  we  have 


dfn(z-n) 

dafn 

9fn(  a~n) 


fl-Tt  3/n(a-n) 

\f (a-n)2  +  1  da-n 

al_n  df2n  (a_. 


daln  V(a~n)2  +  l’  da~r 
According  the  definition  of  (2.25),  we  have 


V  (a-n)2  +  1 

Q-n 

V  (a-n)2  +  1 


j  12  =  max  <{  maxag_4  Y,k=i 

2  a 


dfltja-n) 


=  max  <  maxai ga,  ,  , 

'  11  \/RE+T 

Similarly,  we  can  obtain  [T 


da\ 

,  maxaie_41 


,  maxae_4  Ylk=i 

2a?  1  _  2 Mi 


dfjia-n) 

daf 


maxn  _  2M2 

21  —  - 


VMi+ 1 ' 


\J  (ai)2+i  /  VMi2+i ' 
Therefore,  p(Tmax) 


(2.49) 


(2.50) 


4  Mi  M2 


y/M1+l  aJmZ+1  ' 
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Figure  2.2:  Actions  versus  iterations  in  Example  2.1. 

It  is  easy  to  show  that  p(Tmax)  <  1  (Mf  —  |)(Mf  —  |)  <  |.  By  condition 
(C4),  we  know  that  if  (Mf  —  |)(Mf  —  |)  <  the  best  response  dynamics  is 
guaranteed  to  converge  to  a  unique  NE.  We  numerically  simulate  a  scenario  with 
parameters  =  |  and  M2  =  1  in  which  condition  (C4)  holds.  We  generate 
multiple  initial  action  profiles  of  and  a2,  iterate  the  best  response  dynamics, 
and  obtain  the  action  sequences  aj  and  af>.  Fig.  2.2  shows  the  trajectories  of 
a}'*  and  a2’*  for  different  realizations.  We  can  see  that,  best  response  dynamics 
converges  to  a  unique  NE.  If  we  set  Mi  =  2  and  M2  =  1,  condition  (C4)  does 
not  hold  any  more.  We  observe  from  simulations  that  in  many  circumstances 
the  best  response  dynamics  will  not  converge,  which  agrees  with  our  analysis  in 
Remark  2.6. 

Now  we  consider  Example  2.3,  which  is  the  problem  of  minimizing  queueing 
delays  in  a  Jackson  network.  In  particular,  we  consider  a  network  with  N  =  5 
nodes  and  K  =  3  traffic  classes.  The  total  routing  probability  1— r^0  that  node  m 
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Figure  2.3:  Probability  of  (C2)  and  (C3)  versus  1  —  r^0  for  Vm,  k,  N  =  5,  K  =  3. 

will  route  packets  of  class  k  completing  service  to  other  nodes  is  the  same  for  V?n  E 
AT.  We  varied  the  total  routing  probability  1  —  r^0  and  generated  multiple  sets  of 
network  parameters  in  which  are  uniformly  distributed  for  n  —  1,  2,  ■  ■  •  ,N, 
fi^  are  uniformly  selected  in  [4,  5]  for  Vrt,  k,  and  are  uniformly  chosen  in 
[0.6, 1]  for  n  =  1,  2,  •  •  •  ,  N. 

First  of  all,  we  compare  the  range  of  validity  of  the  proposed  convergence 
conditions.  As  we  mentioned  before,  we  have  FA  =  A  in  this  example. 

Note  that  (I  — Rfe)_1  =  I  +  ^°l1(Rfc)1  and  R/'  is  a  non-negative  matrix.  Therefore, 
we  can  conclude  >  0 ,Vm  ^  n,k.  Moreover,  since  h*(x)  =  — i— ^ —  we 
choose  to  compare  conditions  (C2)  and  (C3).  In  Fig.  2.3,  we  plot  the  probability 
that  conditions  (C2)  and  (C3)  are  satisfied  versus  the  total  routing  probability  1  — 
r^0.  From  Fig.  2.3,  we  can  see  that  the  probability  of  guaranteeing  convergence 
decreases  as  the  routing  probability  1  —  rfn0  increases  and  condition  (C3)  shows 
a  similar  but  slightly  broader  validity  than  (C2).  Fig.  2.4  shows  the  delay 
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Figure  2.4:  Delays  of  nodes  versus  iterations. 

trajectories  of  three  nodes  using  both  sequential  and  parallel  updates  in  a  certain 
network  realization  in  which  (C2)  and  (C3)  are  satisfied.  We  can  see  that,  the 
parallel  update  converges  faster  than  the  sequential  update. 

In  Fig.  2.3,  we  also  note  that  the  probability  that  (C2)  or  (C3)  is  satisfied 
transits  very  quickly  from  the  almost  certain  convergence  to  the  non- convergence 
guarantee  as  1 — rknQ  varies  from  0.5  to  0.58.  Similar  observations  have  been  drawn 
in  the  multi-channel  power  control  problem  [SPB08],  where  9  =  —  1  in  (2.18)  and 
the  probability  that  condition  (C3)  is  satisfied  exhibits  a  neat  threshold  behavior 
as  the  ratio  between  the  source-interferer  distance  and  the  source-destination 
distance  varies.  In  Jackson  networks,  this  threshold  can  be  roughly  estimated. 
Define  [Sfc]mn  =  F^n  for  m  ^  n  and  [SA:]nn  =  0  for  n  G  A f.  If  we  fix  1  -  r^0 
for  Vm,  k ,  we  prove  in  Appendix  G  that  p(Sk)  <  ~l - 1  for  Vfc.  Therefore, 

rm  0 

p( Sk)  <  1  when  rkmQ  >  0.5.  We  would  like  to  estimate  p(Tmax)  and  p(Smax)  based 
on  p(Sfc).  Note  that  Tmax  defined  in  (2.16)  is  the  element-wise  maximum  over 
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—  Nash  equilibrium 

-  Gradient  play 

■  -  Jacobi  update 


3.71  \ 


i  l 
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i  \ 

3.67-1  \ 


100  150 

iteration 


Figure  2.5:  Illustration  of  convergence  for  gradient  play  and  Jacobi  update. 

for  A:  =  1,2,...,  K .  Since  Tmax  and  Sk  are  all  non-negative  matrices,  we  know 
that  p(Tmax)  >  rnaxfc  p(Sk).  In  addition,  recall  the  effect  of  maxm^n  (mn  discussed 
in  Remark  2.4.  We  can  approximate  p(Smax)  defined  in  (2.20)  using  p(Smax)  ~ 
maxmi„  Cmn  maxfe  p(Sfc).  Therefore,  we  expect  that  p(Tmax)  and  p(Smax)  exceeds 
1  for  r^0  <  0.5,  which  agrees  with  our  observation  from  Fig.  2.3.  The  physical 
interpretation  is  that,  if  the  packets  exit  the  network  with  a  probability  less  than 
50%  after  completing  its  service,  i.e.  more  than  half  of  the  served  packets  will  be 
routed  to  other  nodes,  the  strength  of  the  mutual  coupling  among  users  becomes 
too  strong  and  the  multi-user  interaction  in  Jackson  networks  will  gradually  lose 
its  convergence  guarantee. 

In  addition,  we  numerically  compare  two  distributed  algorithms  in  which  users 
pass  coordination  messages  in  real  time,  including  Jacobi  update  and  gradient 
play.  Fig.  2.5  shows  the  delay  evolution  of  both  distributed  solutions  for  a 
particular  simulated  network  in  which  we  set  k  =  0.2.  We  initialize  the  system 
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parameters  such  that  infn  fc  /ijj  —  X^m=i  vmn^m  >  0  and  both  conditions  (C7)  and 
(C8)  are  satished.  We  can  verify  that  for  Example  2.3,  problem  (2.36)  is  in  fact  a 
convex  program.  Therefore,  there  exists  a  unique  operating  point  at  which  KKT 
conditions  (2.37)-(2.39)  are  satished.  We  can  see  that,  both  algorithms  cause  the 
total  delay  to  monotonically  decrease  until  it  reaches  the  same  performance  limit 
that  is  strictly  better  than  NE.  Using  the  same  stepsize  k,  Jacobi  update  converges 
more  quickly  than  gradient  play  in  this  example.  Similar  observations  are  drawn 
in  the  other  simulated  examples.  This  is  because  the  update  directions  of  these 
two  algorithms  are  different.  Jacobi  update  moves  directly  towards  the  optimal 
solution  of  (2.42),  which  is  a  local  approximation  of  the  original  optimization 
program  in  (2.36),  whereas  the  gradient  play  algorithm  simply  updates  the  actions 
along  the  gradient  direction  of  (2.36). 

2.6  Concluding  Remarks 

In  this  chapter,  we  propose  and  investigate  a  new  game  model,  which  we  refer 
to  as  additively  coupled  sum  constrained  games,  in  which  each  player  is  subject 
to  a  sum  constraint  and  its  utility  is  additively  impacted  by  the  remaining  users’ 
actions.  The  convergence  properties  of  various  generic  distributed  adjustment 
algorithms,  including  best  response,  gradient  play,  and  Jacobi  update,  have  been 
investigated.  The  sufficient  conditions  obtained  in  this  chapter  generalize  the 
existing  results  developed  in  the  multi-channel  power  control  problem  and  can 
be  extended  to  other  applications  that  belong  to  ACSCG. 

2.7  Appendix  A:  Proof  of  Theorem  2.1 

The  following  lemma  is  needed  to  prove  Theorem  2.1. 
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Lemma  2.1  Consider  any  non- decreasing  function  p(x)  and  non-increasing  func¬ 
tion  q{x).  If  there  exists  a  unique  x*  such  that  p(x*)  =  q{x*),  and  the  functions 
p(x)  and  q(x)  are  strictly  increasing  and  strictly  decreasing  at  x  =  x*  respectively, 
then  x*  =  axgminx{max{p(x),g(x)}}. 


Proof  of  Lemma  2.1:  See  Lemma  1  in  [CHC07].  ■ 


Denote  as  the  action  of  user  n  in  the  kth  dimension  after  iteration  t.  Recall 
that  [/4]'(.)  >  0,  for  Vn,  k.  Therefore,  Yhk=  \  axf  =  Mn  is  satisfied  at  the  end  of 
any  iteration  t  for  any  user  n.  Define  [x]+  =  max{i,  0}  and  {x}~  =  max{— x,  0}. 
It  is  straightforward  to  see  that 

X>»‘  -  =  E l“n‘  -  (2.51) 

k= 1  k=  1 


We  also  define 


and 


P 


n^x)  = 


K 

E 

k= 1 


K 


.ri.i. 


rx)  = 


E« 


k,t 


k= 1 


(2.52) 


(2.53) 


in  which  /*(•)  is  defined  in  (2.12).  Since  h^(-)  is  a  continuous  increasing  and 
strictly  concave  function,  it  is  clear  that  {^f}  *  (•)  is  a  continuous  decreasing 
function.  If  pn,t(Xtrfl)  0  (i.e.  it  has  not  converged),  pn,t(x)  ( qn,t(x ),  respec¬ 
tively)  is  non-decreasing  (non-increasing)  in  x,  and  strictly  increasing  (strictly 
decreasing)  at  x  =  X^f1.  From  (2.51)  it  is  always  true  that  pn,t( AJ+1)  =  qn,t( A^+1). 
We  first  prove  the  convergence  of  the  parallel  update  case  in  (2.15).  For  Vn,  we 
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have 

K 


Ei 

k= 1 


k,t- (-1  k,t~\-\- 


—  max 


{E 


K  K 

k,t+ 1  +  k,t-\- 1  k,t]  — 


Cij  rr,  I  5  Ei  a"’" 1  *  —  a;: 


fc=i 


fc=i 


=  max{p”J(AL+1),9”J(A«)} 

^maxfp^fAy.^fAy} 

K 

<  max  i^rf 

k= 1  my^n 


{ E  [  E  -  < 


k,t  _  k,t—l\ 


',E[EFhte‘-4r') 

k=  1  my^n 


<  max 


{EE[j 

k=  1  m^n 
K 


K 


=  max 


m^n  k= 1 


pk  /  k,t.  _  k,t~l\ 

1  mn  Y^m  ) 

+>EE 

pk  (  k,t  _  k,t-l\ 

1mn\u,m  06  ra  ) 

"1 

k=  1  my^n 

pk  (nk,t  _  k,t-l\ 
r  mnV^m  ) 

+.EE 

pk  ( flk,t  _  k,t-l\ 
1  mnx^m  ) 

m^n  k= 1 
K 


<  E  mr  ifu  ■  { E  [<**  - hr1]  +  E  [a™  - 

my^n  k=  1  /c=l 

E  2  max  1^11  •  E  [< 


k,t—  1 


l“m  -“m_1 


my^n 


k=  1 


(2.54) 

(2.55) 

(2.56) 

(2.57) 

(2.58) 

(2.59) 

(2.60) 
(2.61) 


where  (2.54)  and  (2.61)  follows  from  (2.51),  (2.55)  follows  from  the  definition  of 
pn,t  and  qn,t  in  (2.52)  and  (2.53),  (2.56)  is  due  to  Lemma  1  in  which  x  =  A^,  (2.57) 
follows  from  the  definition  of  pn,t  and  gn,t,  the  expression  of  a in  (2.15),  and  the 
fact  that  [[ x]ba-[y]ba]+  <  [ x-y}+  and  [[x]ba-[y]b]~  <  [x-y]~,  (2.58)  is  due  to  the 
fact  that  [x  +  y}+  <  [;c]+  +  [y]+  and  [x  +  y]~  <  [x]_  +  [y\~ ,  (2.60)  follows  by  using 
[ T,kxkVk]+  <  J2k  \xk\\Vk\  =  Xfc  \xk\{[yk}+  +  [Vk]~)  <  maxfc  \xk\  Xfc(N+  +  [VkY)- 


For  user  n,  we  define  that  = 


nk,t  _  nk,t—  1 

am  am 


Inequality  (2.61)  can  be  written 


as  4+1  <  Xm/JTmaX]mrh4  in  which  Tmax  is  defined  in  (2.16). 
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Since  Tmax  is  a  nonnegative  matrix,  by  the  Perron- Frobenius  Theorem  [BT97], 
there  exists  a  positive  vector  w  =  [w i  . . .  wpj]  such  that 


I  oo,mat 


=  P(TL 


(2.62) 


where 


W 

I  oo.mat 


is  the  weighted  maximum  matrix  norm  defined  as 

N 


I A  llw 

I  II  oo.mat 


l 

max  — 
»=1,2,-  ,n  w 


]jjWj,  A  G  77 


NxN 


(2.63) 


3= 1 


Define  the  vectors  e4+1  =  [e4+1,  e^1, . . . ,  e^Y  and  e4  A  [e4,  e\, . . . ,  e^Y .  The 
set  of  inequalities  in  (2.61)  can  be  expressed  in  the  vector  form  as  0  <  e4+1  < 
Tmaxe4.  By  choosing  the  vector  w  that  satisfies  ||Tmax||™imat  =  p(Tmax)  and 
applying  the  infinity  norm  ||  •  ||™,  we  obtain  the  following 


A~\~l  ||W 


<  2 II T 


max^£||w  2||Xn 


A  ||  w 


I  oo,mat  I 


(2.64) 


Finally,  based  on  (2.61)  and  (2.64),  it  follows  that 


max 


0t-\- 1 

"n  _  || £+1  ||w 


neJ\f  m 

<  2 II rpmax 1 1 w 


=  e 


c\  M  mmax  1 1  w  1 1  1 1 1  w 

oo  —  1 1  oo.mat  II  ^  I  loo 


(2.65) 


oo,mat  •  max  -A  =  2p(T  ,  _ 

n£j\f  w„  n£j\[  W. 


maxN  max 


Therefore,  if  |T1J 


=  p(Tmax)  <  Y  the  best  response  dynamics  in  (2.15)  is 


|Wn 

l 2 


a  contraction  with  the  modulus  ||Tmax||^mat  with  respect  to  the  norm  maxney  — ^ 
We  can  conclude  that,  the  best  response  dynamics  has  a  unique  fixed  point  a* 
and,  given  any  initial  value  a0,  the  update  sequence  {a4}  converges  to  the  fixed 
point  a*. 

In  the  sequential  update  case,  the  convergence  result  can  be  established  by 
using  the  proposition  1.4  in  [BT97].  The  key  step  is  to  obtain 


max  — 

n£j\f  Wr, 


<2p(T 


e4+1  e4 
max\  max  <1  max  -4 — ,  max  — 


j<n  Wj  j>n  Wj 


(2.66) 
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A  simple  induction  on  n  yields 

p£+l  pt 

max  <  2p(Tmax)  ■  max  ^  (2.67) 

n£j\f  Wn  n£j\[  Wn 

for  all  n.  Therefore,  inequality  (2.61)  also  holds  for  the  sequential  update  and 
the  contraction  iteration  globally  converges  to  a  unique  equilibrium.  ■ 


2.8  Appendix  B:  Proof  of  Theorem  2.2 


If  F^n  >  0,  Vm  7^  n,  k,  the  inequalities  after  (2.59)  become 


K 


K 


max  { E  E  [F^J  -  .  E  E  -  ift1) 

my^n  k= 1  my^n  k= 1 

K 


<  Y  max  ^ mn  ■ max  { 


my^n 


k,t  _  k,t~  i 


k=  1 


•E[ 

k= 1 


k,t  _  k,t- 1 
um 


K 


my^n  k= 1 

Similarly,  for  F^n  <  0,  Vm  7^  n,  k ,  we  have 


max 


K  K 

\  A  \  ^  I"  zpA;  /  _  k,t-l\  \  ^  \  ^  I"  77^  (nk,t  _  k,t-l\ 

/  j  7  j  1 1  mn  rm  v  /  v  1 1  mn \am  ) 


my^n  k=  1 


my^n  k= 1 


< 


K  K 

E  { E  [a™  -  _I]  ■  E 

m^n  k=  1  /c=l 


u?n  um 


(2.68) 

(2.69) 

(2.70) 


(2.71) 

(2.72) 


=  Y  nfX{-Fmn}  ■  Y  \°7  ~  am  1 

my^n  k=  1 


(2.73) 


Therefore,  if  F^n  >  0,  Vm  7^  n,  k  or  F^n  <  0,  Vm  7^  n,  fc,  given  (C2),  the  sequence 
{a(j}  contracts  with  the  modulus  p(Tmax)  <  1  under  the  norm  maxnejv 
and  the  convergence  follows  readily.  ■ 
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2.9  Appendix  C:  Proof  of  Theorem  2.3 


Let  ||  •  ||™  denote  the  weighted  Euclidean  norm  with  weights  w  =  [w\  . .  .wk]t, 
i.e.  ||x||™  =  (JT  Wi\xi\2)1/2  [HJ81].  Define  the  simplex 

1  K 

S  4  |x  e  nK  :  -  J2  *k  =  1,  xtn  <xk<  xT,  Vfc  =  1,2, . . .  ,/t  }.  (2.74) 

k= 1 

in  which  'Yhkxsffayi  —  1-  The  following  lemma  is  needed  to  prove  Theorem  2.3. 


Lemma  2.2  The  projection  with  respect  to  the  weighted  Euclidean  norm  with 
weights  w,  of  the  K-dimensional  real  vector  — x0  =  —  [x0,i, . . . ,  x0>k]t  onto  the 
simplex  S  defined  in  (2.74),  denoted  by  [— x0]“,  is  the  optimal  solution  to  the 
following  convex  optimization  problem: 


bxo] 


W  A  ‘11 

o  =  arg  mm  x 

5  xes11 


;-xo)i 


(2.75) 


and  takes  the  following  form: 


xh  = 


A 

wk 


^0  ,k 


(2.76) 


where  A  >  0  is  chosen  in  order  to  satisfy  the  constraint  T  Y)k=i  xt  =  1 


Proof  of  Lemma  2.2:  See  Corollary  2  in  [SPB08].  ■ 

For  /i*(-)  defined  in  (2.18),  user  n  updates  its  action  according  to 


=  Z*(a_n,A*)  = 


(— ) 

V  pk  > 


1+ 


(A 


*b 


an 

pk 


Fk  ak 
/  j  mn  m 

m^n 


n,k 


(2.77) 


and  A*  is  chosen  to  satisfy  Y^k=i  °*n  =  Mn.  Dehne  the  vector  update  operator  as 
[BR(a_n)]fc  =  a*fc  and  the  coupling  vector  as 

[Cn(a_n)]fc  &  Finat  (2.78) 

nn  m+n 
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with  k  G  {1, . . . ,  K }.  We  also  define 


F'  =  diae  F1  F2  F 

mn  o  \  mn  ?  mn  i  *  *  *  ?  ? 


mn 


(2.79) 


and 


a 


at 


at 


a. 


K 


pi  ’  p2  ’  "  '  ’  pA- 

nn  nn  mm 


(2.80) 


Therefore,  the  coupling  vector  can  be  alternatively  rewritten  as 


Cn(a_n)  —  a!n  +  ^  F'mnam.  (2.81) 

m^n 


Define  a  weight  matrix  W  = 
according  to 

[W]kn 


[wj  ...  w^]  in  which  the  element  [W]/cn 


=  wn  k  = 


(F‘ 
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k  ' 
nm 


1+; 


l^k 


1  (Fk  V 

=1 K1  nnJ 


is  chosen 

(2.82) 


By  Lemma  2.2,  we  know  that  the  vector  update  operator  BRn(a_n)  in  (2.19)  can 
be  interpreted  as  the  projection  of  the  coupling  vector  — Cn(a„n)  onto  user  n’s 
action  set  An  with  respect  to  ||  •  ||™n,  i.e. 


BRn(a_n)  =  [-Cn(a_„)]^.  (2.83) 

Given  any  a^\a^  G  A,  we  define  respectively,  for  each  user  n,  the  weighted 
Euclidean  distances  between  these  two  vectors  and  their  projected  vectors  using 
(2.83)  as  en  =  ||a^2)  -  ai1)||™n  and  eBRn  =  ||BRn(a^)  -  BRn(a^)  ||™n.  Again, 
we  first  prove  the  convergence  of  the  parallel  update  case  in  (2.15).  We  have 
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Vn  e  A  f, 
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(2.86) 

(2.87) 

(2.88) 

(2.89) 

(2.90) 


where  (2.84)  follows  from  the  non-expansion  property  of  the  projector  [■]))["  in  the 
norm  ||  •  ||™n  (See  Proposition  3.2(c)  in  [BT97]),  (2.86)  follows  from  the  triangle 
inequality  [HJ81],  and  Smax  in  (2.90)  is  defined  according  to  (2.20). 


The  rest  of  the  proof  is  similar  as  the  proof  after  equation  (2.61)  in  Appendix 
A.  Details  are  omitted  due  to  space  limitations.  ■ 
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2.10  Appendix  D:  Proof  of  Theorem  2.4 


The  beginning  part  of  the  proof  is  the  same  as  the  proof  of  Theorem  2.1.  For  any 
user  n  with  general  /^(-),  the  inequalities  after  (2.55)  become 
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where  (2.91)  follows  from  the  definition  of  pn,t  and  q,n,t  and  the  expression  of  a£’* 
and  B^(a_ni  A)  in  (2.15)  and  (2.24),  (2.92)  follows  from  the  mean  value  theorem 
for  vector- valued  functions  with  £*  =  aa*  +  (1  —  a)at_1  and  a  G  [0, 1].  By  (C4), 
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it  is  straightforward  to  show  that  the  iteration  is  a  contraction  by  following  the 
same  arguments  in  Appendix  A.  The  rest  of  the  proof  is  omitted.  ■ 


2.11  Appendix  E:  Proof  of  Theorem  2.6 


The  gradient  play  algorithm  in  (2.42)  is  in  fact  a  gradient  projection  algorithm 
with  constant  stepsize  k.  In  order  to  establish  its  convergence,  we  first  need  to 
prove  that  the  gradient  of  the  objective  in  (2.36)  is  Lipschitz  continuous,  with  a 
Lipschitz  constant  given  by  L  >  0,  i.e. 

N  N 

v  ( Un^)  -  v  ( Un (y)) 

n= 1  n= 1 

It  is  known  that  it  has  the  property  of  Lipschitz  continuity  if  it  has  a  Hessian 
bounded  in  the  Euclidean  norm. 

The  Hessian  matrix  H  of  E^iwn(a)  can  decomposed  into  two  matrices: 
H  =  Hx  +  H2,  in  which  the  elements  of  matrix  Hi  are 
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Recall  that  gkn  ( • )  is  Lipschitz  continuous  and  it  satisfies 


V3n(X)  -  V9kn( y) 


EL'Ilx-yll,  Vn,  fc,x,y  G  A-n. 


Consequently,  we  have  || H2 1| 2  <  NKL' .  As  a  result,  we  can  estimate  the  Lipschitz 
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constant  L  using  the  following  inequalities 

||H || 2  <  || Hx || 2  +  || H2 1|2  <  \/|| Hi || 1 1| Hj_ 

d2ht 


<  sup 

x,n,k 


d2x 


+  NKL' 
+NKL'. 


N  N 


(2.100) 


m=  1  n=l 

We  can  choose  the  RHS  of  (2.100)  as  the  Lipschitz  constant  L  .  By  Proposition 
3.4  in  [BT97],  we  know  that  if  0  <  n  <  2/L,  the  sequence  a*  generated  by  the 
gradient  projection  algorithm  in  (2.43)  and  (2.44)  converges  to  a  limiting  point 
at  which  the  KKT  conditions  in  (2.37)-(2.39)  are  satisfied.  ■ 


2.12  Appendix  F:  Proof  of  Theorem  2.7 


We  know  from  the  proof  of  Theorem  2.6  that,  under  Condition  (C7),  Yln=i  un(a) 
is  Lipschitz  continuous  and  the  inequality  in  (2.97)  holds.  Recall  that  un(x) 
is  continuously  differentiable.  Therefore,  by  the  descent  lemma  [BT97],  we  have 
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Therefore,  in  order  to  prove  ^n=1  un( a4)  >  ^n=1  wn(a<  x),  we  only  need  to  show 
that 
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for  sufficiently  small  k.  Substituting  (2.46)  into  (2.102),  we  can  see  that  it  is 


equivalent  to 

N  K 
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By  equation  (2.45),  we  have 
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and 
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By  the  mean  value  theorem,  there  exists  ^6  7?  such  that 
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Multiplying  (2.104)  and  (2.105)  leads  to 
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m^n 


n= 1  k= 1 


In  the  following,  we  differentiate  two  cases  in  which  the  Lagrange  multipliers 
A n,  ^n,  ^  take  different  values. 

First  of  all,  if  \n  =  vk  =  v'k  =  0  for  all  k ,  n,  equation  (2.106)  can  be  simplified 
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as 
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On  the  other  hand,  if  An  >  0,  vk  >  0,  or  vk  >  0  for  some  k,  n.  Dne  to 
complementary  slackness  in  (2.38)  and  (2.39),  We  know  that 
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As  a  result,  the  last  term  in  (2.106)  satisfy 
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Therefore,  in  both  cases,  the  following  inequality  holds 
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Finally,  we  can  conclude  that  the  inequality  in  (2.103)  holds  for  n  <  y  ■  (—  maxn^ 
supx.  9  )  •  Recall  that  Jacobi  update  requires  n  G  (0, 1].  The  stepsize  n  can 

be  eventually  chosen  as  0  <  k  <  min{-|  ■  (—  maxny  supx  9  1}  ■ 


2.13  Appendix  G:  Upper  Bound  of  p(Sk) 

Denote  1T  =  [1  1  •  •  •  1]T.  If  we  fix  1  —  rfn0  for  Vm,  k,  we  have  lrRfc  =  (1  —  r,^0)lT. 
Note  that  Tk  =  (I-R^)"1  =  I+E*=i(Rfe)k  We  have  lTTfc  =  lT(l+E2=i(Rfe)‘)  = 
1T  +  (1  —  rk0)lTTk  and  lrTfc  =  lr.  Therefore,  |Tfc|i  =  -U-  Since  Fk  = 
and  Tfc  =  I  +  EiE(Rfc)\  we  know  [' Tk]nn  >  1  for  Vra.  Denote  a  diagonal 
matrix  diag(Tfc)  with  the  entries  of  Tk  on  the  diagonal.  Recall  that  [Sfc]mn  =  Fkin 
for  m  7^  n,  and  [Sfc]nn  =  0  for  n  G  J\f.  We  can  conclude  that  p( Sk)  <  |Sfc|oo  < 
|(U)T  -  diag(Tfc)|oo  <  KTTloo  -  1  =  |Tfe|!  -1  =  ^-1. 

rm0 
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CHAPTER  3 


Stackelberg  Equilibrium  in  Power  Control 

Games 

The  previous  chapter  presents  the  model  of  ACSCG  and  provides  several  sufficient 
conditions  for  various  generic  distributed  algorithms.  As  a  special  subclass  of  the 
ACSCG,  the  multi-user  power  control  problem  in  frequency-selective  interference 
channels  was  investigated  from  the  game-theoretic  perspective  in  several  prior 
works,  including  [YGC02,  EPT07,  HBH06,  CYM06,  YL06,  CHC07].  In  these 
multi-user  wideband  power  control  games,  users  are  modeled  as  players  having 
individual  goals  and  strategies.  They  are  competing  or  cooperating  with  each 
other  until  they  agree  on  an  acceptable  resource  allocation  outcome.  Existing 
research  can  be  categorized  into  two  types,  non-cooperative  games  and  cooperative 
games. 

First,  the  formulation  of  the  multi-user  wideband  power  control  problem  as 
a  non-cooperative  game  has  appeared  in  several  recent  works  [YGC02,  EPT07]. 
An  iterative  water-filling  (IW)  algorithm  was  proposed  to  mitigate  the  mutual 
interference  and  optimize  the  performance  without  the  need  for  a  central  con¬ 
troller  [YGC02],  At  every  decision  stage,  selfish  users  deploying  this  algorithm 
try  to  maximize  their  achievable  rates  by  water-filling  across  the  whole  frequency 
band  until  a  Nash  equilibrium  is  reached.  Alternatively,  self-enforcing  proto¬ 
cols  are  studied  in  the  non-cooperative  scenario,  in  which  incentive  compatible 
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allocations  are  guaranteed  [EPT07].  By  imposing  punishments  in  the  case  of  mis¬ 
behavior  and  enforcing  users  to  cooperate,  efficient,  fair,  and  incentive  compatible 
spectrum  sharing  is  shown  to  be  possible. 

Second,  there  also  have  been  a  number  of  related  works  studying  dynamic 
spectrum  management  (DSM)  in  the  setting  of  cooperative  games  [HBH06,  CYM06] 
[YL06,  CHC07].  Two  (near-)  optimal  but  centralized  DSM  algorithms,  the  Op¬ 
timal  Spectrum  Balancing  (OSB)  algorithm  and  the  Iterative  Spectrum  Balanc¬ 
ing  (ISB)  algorithm,  were  proposed  to  solve  the  problem  of  maximization  of  a 
weighted  rate-sum  across  all  users  [CYM06,  YL06].  OSB  has  an  exponential 
complexity  in  the  number  of  users.  ISB  only  has  a  quadratic  complexity  in  the 
number  of  users  because  it  implements  the  optimization  in  an  iterative  fash¬ 
ion.  An  autonomous  spectrum  balancing  (ASB)  technique  is  proposed  to  achieve 
near-optimal  performance  autonomously,  without  real-time  explicit  information 
exchanges  [CHC07].  These  works  focus  on  cooperative  games,  because  it  is  well- 
known  that  the  IW  algorithm  may  lead  to  Pareto-inefficient  solutions  [PPR07], 
i.e.  selfishness  is  detrimental  in  the  interference  channel. 

In  short,  previous  research  mainly  concentrates  on  studying  the  existence 
and  performance  of  Nash  equilibrium  in  non-cooperative  games  and  developing 
efficient  algorithms  to  approach  the  Pareto  boundary  in  cooperative  games.  How¬ 
ever,  an  important  intrinsic  dimension  of  this  decentralized  multi-user  interaction 
still  remains  unexplored.  Prior  research  does  not  consider  the  users’  availability  of 
information  about  other  users  and  their  potential  to  improve  their  performance 
when  having  this  information.  Hence,  determining  what  is  the  best  response 
strategy  of  a  selfish  user  if  it  has  the  information  about  how  the  competing  users 
respond  to  interference  still  needs  to  be  determined.  Moreover,  it  still  needs  to 
be  established  if  such  strategies  can  lead  to  a  better  performance  than  adopting 
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the  IW  algorithm.  It  is  important  to  look  at  these  scenarios  in  order  to  assess  the 
significance  of  information  availability  in  terms  of  its  impact  on  the  users’  per¬ 
formance  in  non-cooperative  games,  and  show  why  selfish  users  have  incentives 
to  learn  their  environment  and  adapt  their  rational  response  strategies  [Hay05]. 
Intuitively,  a  “clever”  user  with  more  information  in  this  non-cooperative  game 
should  be  able  to  gain  additional  benefits  [Hay07]. 

Throughout  this  chapter,  we  differentiate  two  types  of  selfish  users  based  on 
their  response  strategies: 

1)  Myopic  user:  A  user  that  always  acts  to  maximize  its  immediate  achiev¬ 
able  rate.  It  is  myopic  in  the  sense  that  it  treats  other  users’  actions  as  fixed, 
ignores  the  dependence  between  its  competitors’  actions  and  its  own  action,  and 
determines  its  response  such  that  maximize  its  immediate  payoff. 

2)  Foresighted  user-.  A  user  that  selects  its  transmission  action  by  considering 
the  long-term  impacts  on  its  performance.  It  anticipates  how  the  others  will 
react,  and  maximizes  its  performance  by  considering  their  reactions.  It  should  be 
highlighted  that  additional  information  is  required  to  assist  the  foresighted  user 
in  its  decision  making. 

As  opposed  to  previous  approaches  considering  myopic  users  [YGC02],  we 
discuss  in  this  chapter  how  foresighted  users  should  behave  in  non-cooperative 
power  control  games.  We  explicitly  show  that  a  strategic  user  can  gain  more  ben¬ 
efit  if  it  takes  its  competitors’  information  and  response  strategies  into  account. 
The  concept  of  Stackelberg  equilibrium  is  adopted  in  order  to  characterize  the 
optimal  power  control  strategy  of  a  foresighted  user  by  considering  the  response 
of  its  competing  users.  For  the  two-user  case,  we  formulate  the  foresighted  user’s 
decision  making  to  be  a  bi-level  programming  problem,  show  that  the  optimal 
solution  is  computationally  prohibitive,  and  provide  a  low-complexity  algorithm 
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based  on  Lagrangian  duality  theory. 

We  also  note  that  there  are  already  some  papers  applying  Stackelberg  equi¬ 
librium  to  allocate  the  resources  in  networking  [ABE06].  However,  the  problems 
and  the  proposed  solutions  in  these  papers  are  completely  different  from  this 
chapter.  The  focus  here  is  to  study  the  strategic  behavior  of  selfish  users,  which 
has  not  been  yet  investigated  in  multi-user  interference  channels. 

The  rest  of  the  chapter  is  organized  as  follows.  Section  3.1  presents  the  non- 
cooperative  game  model  and  introduces  the  concept  of  Stackelberg  equilibrium. 
In  Section  3.2,  using  a  simple  two-user  example,  we  formulate  the  foresighted 
user’s  optimal  decision  making  as  a  bi-level  programming  problem  and  discuss 
the  computational  complexity  of  its  optimal  solution.  Section  3.3  proposes  a  low- 
complexity  dual-based  approach  and  provides  the  simulation  results.  Section  3.3 
also  discusses  how  the  required  information  can  be  obtained  by  the  strategic  users 
and  the  problem  formulation  in  general  multi-user  case.  Concluding  remarks  are 
drawn  in  Section  3.4. 

3.1  System  Model 

In  this  section,  we  describe  the  mathematical  model  of  the  frequency-selective 
interference  channel  and  formulate  the  non-cooperative  multi-user  power  con¬ 
trol  game.  We  introduce  the  concept  of  Stackelberg  equilibrium  and  prove  the 
existence  of  this  equilibrium  in  the  power  control  game. 

3.1.1  System  Description 

Fig.  3.1  illustrates  a  frequency-selective  Gaussian  interference  channel  model. 
There  are  N  transmitters  and  N  receivers  in  the  system.  Each  transmitter  and 
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Figure  3.1:  Gaussian  interference  channel  model. 


receiver  pair  can  be  viewed  as  a  player  (or  user).  The  whole  frequency  band 
is  divided  into  K  frequency  bins.  In  frequency  bin  k,  the  channel  gain  from 
transmitter  i  to  receiver  j  is  denoted  as  where  k  =  1,2 , ,K.  Similarly, 
denote  the  noise  power  spectral  density  (PSD)  that  receiver  n  experiences  as 
and  player  n’s  transmit  PSD  as  Pk.  For  user  n,  the  transmit  PSD  is  subject  to 
its  power  constraint: 

K 

(3.i) 

k= 1 


Define  P„  =  {P„,  P%, . . . ,  P^}  as  user  n’s  power  allocation  pattern.  For  a  fixed 
Pn,  if  treating  interference  as  noise,  user  n  can  achieve  the  following  data  rate: 

^  (  r>k  |  uk  1 2  \ 

(3.2) 


Rn  =  log. 


k= 1 


/  pk  1 

1  n 

tt k  i: 
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2  1 

To  fully  capture  the  performance  tradeoff  in  the  system,  the  concept  of  a  rate 
region  is  defined  as 


11=  |  (Pi,. . . ,  Rn)  :  3  (P1;...,P  n)  satisfying  (1)  and  (2)  j.  (3.3) 
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Due  to  the  non-convexity  in  the  capacity  expression  as  a  function  of  power 
allocations,  the  computational  complexity  of  optimal  solutions  (e.g.,  doing  ex¬ 
haustive  search)  in  finding  the  rate  region  is  prohibitively  high.  Existing  works 
[CYM06,  YL06,  CHC07]  aim  to  compute  the  Pareto  boundary  of  this  rate  region 
and  provide  (near-)  optimal  performance  with  moderate  complexity.  Moreover,  it 
is  noted  that  cooperation  among  users  is  indispensable  for  this  multi-user  system 
to  operate  at  the  Pareto  boundary.  On  the  other  hand,  the  interference  channel 
can  also  be  modeled  as  a  non-cooperative  game  among  multiple  competing  users. 
Instead  of  solving  the  optimization  problem  globally,  the  IW  algorithm  models 
the  users  as  myopic  decision  makers  [YGC02].  This  means  that  they  optimize 
their  transmit  PSD  by  water-filling  and  compete  to  increase  their  transmission 
data  rates  with  the  sole  objective  of  maximizing  their  own  performance  regardless 
of  the  coupling  among  users.  Under  a  wide  range  of  realistic  channel  conditions 
[YGC02,  SLS07],  the  existence  and  uniqueness  of  the  competitive  optimal  point 
(Nash  equilibrium)  is  demonstrated  and  it  can  be  obtained  by  the  IW  algorithm, 
which  significantly  outperforms  the  static  spectrum  management  algorithms. 

Throughout  this  chapter,  we  also  concentrate  on  the  non-cooperative  game 
setting.  In  the  IW  algorithm,  users  are  assumed  to  be  myopic,  i.e.,  they  update 
actions  shortsightedly  without  considering  the  long-term  impacts  of  taking  these 
actions.  We  argue  that  the  myopic  behavior  can  be  further  improved  because  it 
neglects  the  coupling  nature  of  players’  actions  and  payoffs.  In  contrast  with  pre¬ 
vious  approaches,  we  study  the  problem  of  how  a  foresighted  user  should  behave 
rather  than  taking  myopic  actions.  This  investigation  provides  us  some  insights 
to  the  following  question:  why  should  a  strategic  user  sense  its  environment  and 
learn  the  response  strategies  of  its  competitors  and  consequently,  what  is  the 
benefit  that  a  foresighted  user  can  achieve  compared  with  the  myopic  case? 
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Figure  3.2:  Stackelberg  game:  the  row  player’s  payoff  is  given  first  in  each  cell,  with  the 
column  player’s  payoff  following. 

To  illustrate  the  foresighted  behavior,  Fig.  3.2  shows  a  simple  Stackelberg 
game  [SPG07].  Note  that  in  this  game,  the  row  player  has  a  strictly  dominant 
strategy  [FT91],  Down.  Therefore,  two  players  will  end  up  with  a  (Down,  Left ) 
play  if  the  row  player  is  myopic.  However,  if  the  row  player  is  aware  of  the  column 
player’s  coupled  reaction,  they  will  end  up  with  a  ( Up,  Right )  play,  which  leads 
to  an  increased  payoff  for  both  players.  It  is  worth  noticing  that  additional  infor¬ 
mation  is  needed  to  attain  this  performance  improvement.  The  row  player  needs 
to  know  the  payoff  and  the  response  strategy  of  the  column  player.  To  formulate 
how  a  strategic  user  can  take  foresighted  actions,  we  introduce  the  concept  of 
Stackelberg  equilibrium.  The  next  subsection  will  define  the  Stackelberg  equilib¬ 
rium  and  show  its  existence  in  the  power  control  game. 

3.1.2  Stackelberg  Equilibrium 

Let  T  =  (A/",  A,  u)  represent  the  power  control  game,  in  which  user  n’s  payoff 
un  is  the  its  achievable  data  rate  Rn  and  its  action  set  An  is  the  set  of  transmit 
PSDs  satisfying  constraint  (3.1).  Recall  that  the  Nash  equilibrium  is  defined  to 
be  any  (af  . . . ,  a*N )  satisfying 

un  (a* ,  a*_n )  >  un  (an,  a*_n )  for  all  an  E  An  and  n  —  1, . . . ,  N,  (3.4) 
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where  a*_n  =  (a*1,...,a*n_1,a*n+1,...,a*N)  [FT91].  The  action  a*  is  a  best  response 
(BR)  to  actions  a_n  if 

n)  'll  n)  j  V  G  (3.5) 

The  set  of  user  n’s  best  response  to  a_n  is  denoted  as  BRn(a-n). 

The  Stackelberg  equilibrium  is  a  solution  concept  originally  defined  for  the 
cases  where  a  hierarchy  of  actions  exists  between  users  [FT91] .  Only  one  player 
is  the  leader  and  the  other  ones  are  followers.  The  leader  begins  the  game  by 
announcing  its  action.  Then,  the  followers  react  to  the  leader’s  action.  The 
Stackelberg  equilibrium  prescribes  an  optimal  strategy  for  the  leader  if  its  fol¬ 
lowers  always  react  by  playing  their  Nash  equilibrium  strategies  in  the  smaller 
sub-game.  For  example,  in  a  two  player  game,  where  user  1  is  the  leader  and  user 
2  is  the  follower,  an  action  a*  is  the  Stackelberg  equilibrium  strategy  for  user  1  if 

U\  (a^,  BR2  K))  >  ui  (oq,  BR2  (aq)) ,  Vc q  G  A\.  (3.6) 

For  example,  in  Fig.  3.2,  Up  is  the  Stackelberg  equilibrium  strategy  for  the  row 
player. 

Next,  we  define  Stackelberg  equilibrium  in  the  general  case.  Let  NE(an )  be 
the  Nash  equilibrium  strategy  of  the  remaining  players  if  player  n  chooses  to  play 
an,  i.e.  NE  (an)  =  a_n,  Va*  =  BR{  (a_j) ,  cq  G  ^  n. 

Definition  3.1  The  strategy  profile  (a*,  N E  (a*))  is  a  Stackelberg  equilibrium 
with  user  n  leading  iff 

un  (a* ,  NE  (a*))  >  un  (, an ,  NE  (aj) ,  Van  G  An.  (3.7) 

If  multiple  Nash  equilibria  exist  in  the  followers’  sub-game,  the  definition  of 
Stackelberg  equilibrium  becomes  more  complicated.  Interested  readers  can  refer 
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to  [ABE06,  CMS05]  for  more  details.  This  chapter  does  not  consider  this  case 
and  focus  on  the  channels  where  a  unique  Nash  equilibrium  exists  in  the  sub¬ 
game  [SLS07] .  In  particular,  the  considered  channels  satisfies  condition  (C3).  ft 
has  shown  that  a  unique  Nash  equilibrium  for  the  power  control  game  exists  in 
these  channels  and  it  can  be  achieved  using  the  IW  algorithm  [SPB08,  SLS07]. 

In  fact,  the  requirement  of  hierarchic  actions  in  the  original  definition  of  Stack- 
elberg  equilibrium  can  be  removed  in  our  problem  if  we  consider  the  repeated 
interaction  among  all  the  users.  Regardless  of  the  initial  action  order,  the  fore- 
sighted  user  can  always  perform  the  Stackelberg  strategy.  As  long  as  it  changes 
its  transmit  PSD,  the  other  myopic  users  will  water-fill  with  respect  to  their  up¬ 
dated  noise-plus- interference  PSDs  to  gain  an  immediate  increase  in  transmission 
rates  until  the  system  converges  to  an  equilibrium.  We  are  interested  in  the  per¬ 
formance  achieved  at  the  steady  state.  Therefore,  the  initial  action  order  between 
the  foresighted  user  and  the  myopic  users  does  not  influence  the  final  outcome 
of  this  game.  Note  that  initially  we  assume  that  a  single  foresighted  user  exists 
in  this  game.  How  the  users  should  decide  to  play  foresightedly  or  myopically 
and  the  extension  to  the  cases  where  there  are  multiple  foresighted  users  will 
be  discussed  in  Section  3.3.  The  following  theorem  establishes  the  existence  of 
Stackelberg  equilibrium  in  the  considered  power  control  game. 

Theorem  3.1  Under  the  considered  channel  conditions,  the  Stackelberg  equilib¬ 
rium  always  exists  in  the  multi-user  power  control  game. 

Proof :  Suppose  user  1  is  the  only  foresighted  user  in  this  game.  First,  user  l’s 
maximal  achievable  rate  in  an  interference- free  environment  is 

K 

R T  =  £>g2  (l  +  Pt  Kt/crf),  (3.8) 

k=  1 
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where  P-f*  =  (A  —  af/\H^1\‘2)+  is  the  water-hlling  solution,  fy)+  =  max  (0,  a?) ,  and 
A  is  a  constant  satisfying  the  constraint  in  (3.1)  with  equality. 

Second,  it  has  been  shown  that  in  the  considered  channels,  the  existence  and 
uniqueness  of  Nash  equilibrium  are  always  guaranteed  [SLS07].  In  the  interference 
channel  consisting  of  the  N— 1  followers,  whatever  form  of  Pi  £  A\  user  1  chooses, 
they  will  regard  user  l’s  transmit  PSD  as  part  of  the  fixed  background  noise  PSD, 
i.e.  a1-  =  <jj  +  //f)  | 2  Py  ,  j  ^  1  .  Since  the  channel  gains  in  the  followers’  sub¬ 
game  still  satisfy  the  sufficient  condition  in  [SLS07],  the  convergence  to  a  unique 
Nash  equilibrium  always  holds,  i.e.  a  single  NE  (ai)  exists  for  Vaq  £  A\. 

To  summarize,  since  R\  is  bounded,  and  for  Vcq  £  A±,  the  remaining  players’ 
action  will  always  lead  to  a  Nash  equilibrium,  we  have 

0  <  (a1;  NE  (aj)  <  Pfyfy  Vai  £  A±.  (3.9) 

Therefore,  there  exist  £  A\  such  that 

Ui(a*,NE  (al))  —  sup  {u\  (a1;  NE  (ax))}  . 

We  can  conclude  that  Stackelberg  equilibrium  always  exists  for  this  power  control 
game.  ■ 

3.2  Problem  Formulation 

In  this  section,  we  study  how  to  achieve  the  Stackelberg  equilibrium  in  the  two- 
user  case,  and  formulate  the  foresighted  behavior  as  a  bi-level  programming  prob¬ 
lem.  We  analyze  the  computational  complexity  of  the  optimal  solution,  and  show 
that  the  optimum  is  computationally  intractable  for  the  bi-level  program.  We 
start  from  the  simplest  two-user  version,  because  it  is  illustrative  for  understand¬ 
ing  the  interactions  emerging  among  competing  users.  The  extension  to  the 
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multi-user  case  will  be  discussed  in  Section  3.3. 


3.2.1  A  Bi-level  Programming  Formulation 


The  Stackelberg  equilibrium  applied  to  the  two-user  power  control  game  can  be 
represented  by  a  bi-level  mathematical  problem  [CMS05],  in  which  the  foresighted 
user  acts  as  the  leader  and  the  other  user  behaves  as  the  follower.  The  leader 
chooses  a  transmit  PSD  to  maximize  its  own  benefits  by  considering  the  response 
of  its  follower,  who  reacts  to  the  leader’s  transmit  PSD  by  water-filling  over 
the  entire  frequency  band.  Hence,  the  Stackelberg  equilibrium  can  be  found  by 
solving  the  following  optimization  problem: 


upper-level 

problem 
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max  Y  lo§2 
1  k= 1 
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The  sub-problem  in  (3. 10. a)-  (3.10.b)  is  called  the  upper-level  problem  and  (3.10.c)- 
(3.10.d)  corresponds  to  the  lower-level  problem.  Recall  that  additional  informa¬ 
tion  is  indispensable  to  formulate  this  bi-level  program.  This  information  includes 
the  other  user’s  channel  condition  N$  and  a 2,  maximum  power  constraint  P^ax, 
and  its  response  strategy,  i.e.  the  IW  algorithm.  By  letting  Pi  and  P2  to  be  the 
transmit  PSDs  of  the  IW  algorithm  Pf E  and  P^,  we  can  see  that  the  Nash 
equilibrium  actually  gives  a  lower  bound  of  the  problem  in  (3.10).  Furthermore, 
by  including  the  opponent’s  reaction  into  the  lower-level  problem,  the  user  can 
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avoid  the  myopic  IW  approach  and  potentially  improve  its  performance.  In  addi¬ 
tion,  as  we  will  show  later,  user  l’s  foresightedness  turns  out  to  even  improve  the 
myopic  user’s  performance.  Now  we  make  several  illustrative  remarks  by  showing 
two  simple  examples. 

Remark  3.1  The  Nash  equilibrium  achieved  by  the  IW  algorithm  may  not  solve 
the  bi-level  program  (3.10).  In  other  words,  there  exist  other  feasible  power  allo¬ 
cation  schemes  that  can  attain  strictly  better  performance  than  that  of  the  Nash 
equilibrium. 

Example  3.1  We  consider  a  two-user  system  with  the  parameters  N  =  2,  N)  = 
fV|  =  4,  Nf  =  N)  =  l,a^  =  0.5  for  Vn,k,  P™ax  =  P™ax  =  10.  In  this  simple 
two-channel  scenario,  it  is  easy  to  derive  that  R\  =  log2[l  +  P11/(8.5  —  0.25P/)]  + 
log2[l  +  (10  —  P11)/(1.5  +  0.25PJ1)]  bits.  Because  |^f  <  0,  R\  is  maximized 
when  P,1  =  0.  The  achievable  rates  attained  at  the  Stackelberg  equilibrium  is 
RfE  ~  2.939  bits  and  RfE  ~  3.474  bits.  The  unique  Nash  equilibrium  is  reached 
by  Pf^  =  {2,8}  and  PfE  =  {8,2}  and  its  achievable  rates  are  R.fE  =  R!fE  ~ 
2.645  bits. 

Remark  3.2  For  some  channel  realizations,  the  Nash  strategy  solves  the  problem 
(3.10).  If  aE  =  0  for\/n,k,  the  upper-level  and  lower-level  problems  in  bi-level 
program  (3.10)  are  reduced  to  two  uncoupled  problems  and  the  single  user  water- 
filling  solution  can  achieve  the  upper  bound  in  (3.8).  In  addition,  we  give  a 
non-trivial  example  in  which  off  0  for  Vn,  k  and  the  Nash  strategy  still  solves 
the  problem  in  (3.10). 

Example  3.2  Set  the  parameters  N(,Nf  in  Example  3.1  to  be  6,  and  keep  the 
remaining  ones  unchanged.  We  have  R\  =  log2[l  +  P11/(ll  —  0.25PJ1)]  +  log2[l  + 
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(10  —  P11)/(  1  +  0.25P/)]  bits.  In  this  channel  realization,  the  Nash  equilibrium 
coincides  with  the  Stackelberg  equilibrium.  Both  equilibria  are  reached  at  = 
{0, 10}  and  =  {10,  0}  and  the  resulting  rates  are  R\  =  P2  ~  3.460  bits. 

Remark  3.3  As  opposed  to  the  narrow-band  case  [A AOS],  we  would  like  to  high¬ 
light  that  the  degrees  of  freedom  in  allocating  the  power  across  multiple  bands  is 
essential  for  the  foresighted  user  to  improve  its  performance.  Consider  the  single¬ 
band  case  in  which  K  —  1.  Note  that  useri’s  achievable  rate  Ri  is  monotonically 
increasing  in  its  transmitted  power  Pi.  If  users  selfishly  maximize  their  achiev¬ 
able  rates,  all  of  them  will  transmit  at  their  maximum,  power  in  the  single  band, 
which  results  in  the  unique  Nash  equilibrium.  It  is  easy  to  check  that  it  is  also 
the  unique  Stackelberg  equilibrium  and  it  is  also  Pareto  efficient. 

Although  these  examples  provide  us  some  intuition  about  the  relationship 
between  NE  and  SE,  we  are  still  interested  in  computing  the  Stackelberg  equilib¬ 
rium  in  general  scenarios.  The  following  subsection  will  reformulate  the  bi-level 
program  into  a  single-level  problem,  which  helps  us  to  understand  the  computa¬ 
tional  complexity  of  the  Stackelberg  equilibrium  in  the  multi-user  power  control 
games. 

3.2.2  An  Exact  Single-level  Reformulation 

Bi-level  programming  problems  belong  to  the  mathematical  programs  having  op¬ 
timization  problems  as  constraints.  It  is  well-known  they  are  intrinsically  difficult 
to  solve  [CMS05].  To  understand  the  computational  complexity,  we  first  trans¬ 
form  the  original  bi-level  program  into  a  single-level  reformulation  with  the  form 
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Pi  <  pr.  pi  >  °> 


max ' 
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fc=l 


(3.11) 


in  which  g\  (P1?  IV2,  ai- P™^)  is  a  function  that  determines  user  2’s  allocated 
power  in  the  fcth  channel,  IV  2  =  jiVg,  N2, . . . ,  N2  },  and  a1  =  {aj,  af, . . . ,  af-}. 

Note  that  the  lower-level  problem  in  (3.10)  is  a  standard  convex  programming 
problem.  Its  optimum  is  given  by  g2  (P^  N2,a1,  P™ax)  =  (K2  —  N2  —  a^P2)  + , 
where  K2  is  a  constant  that  satisfies  i  P-2  =  P 2iax ■  I11  practice,  K2  is  usually 
obtained  using  numerical  (e.g.  bisection)  methods.  In  fact,  an  explicit  expression 
of  92  (Pi.^aj.Pr)  is  needed  to  analytically  handle  single-level  formulation. 
Towards  this  end,  we  first  define  a  permutation  7r  :  {1,  2, . . . ,  K}  —>  {1,  2, . . . ,  K}, 
which  ranks  all  the  channels  based  on  their  noise  plus  interference  PSDs  and 
satisfies 


ir(/i)  <tt(/2),  if  N{'+a{'P{'  <  Nt+a’iPl 2. 


(3.12) 


Then,  we  can  extend  the  results  in  [AAG08],  and  have  the  following  closed-form 
expression: 
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where  /  can  be  found  according  to  the  condition: 
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We  can  see  that  function  g 2  (Pl5  N2,ce1,  P™ax)  ranks  all  the  frequency  channels 
based  on  the  channel  conditions  and  gradually  increases  the  water-level  until  the 
maximal  power  constraint  is  satisfied. 

Even  though  we  have  the  closed-form  expression  of  g2  (P1,  N2,a1:  P™ax),  the 
single-level  problem  (3.11)  is  still  intractable  due  to  its  non-convexity.  Generally 
speaking,  the  global  optimum  can  only  be  found  via  an  exhaustive  search.  If  we 
define  the  granularity  in  the  foresighted  user’s  transmit  power  as  Ap,  then  the 
value  of  P\  can  be  limited  to  the  set  {0,  Ap, . . . .  P™3’*}  .  By  searching  all  the 
possible  combinations,  the  optimum  can  be  found.  Hence,  such  an  exhaustive 
search  in  (P/, . . . ,  P/v)  has  a  overall  complexity  of  0{{ P™ax/Ap)A). 

Recently,  Lagrangian  duality  theory  has  been  successfully  used  to  solve  non- 
convex  weighted  sum-rate  maximization  in  interference  channel  with  moderate 
computational  complexity  [CYM06,  YL06,  CHC07].  We  notice  that  the  problem 
in  (3.11)  are  similar  with  the  problems  investigated  in  these  works  in  that  the 
optimization  variables  Pi  also  appear  in  the  denominators  of  the  objective  func¬ 
tion.  The  following  sections  will  revisit  these  dual  approaches  and  show  that  these 
methods  cannot  reduce  the  computational  complexity  of  problem  (3.11),  thereby 
demonstrating  the  challenges  involved  in  optimally  computing  the  Stackelberg 
equilibrium. 

3.2.3  Lagrangian  Dual  Approach  for  Non-convex  Problems 

We  continue  studying  the  simple  two-user  scenario  to  introduce  the  dual  method. 
In  a  two-user  frequency-selective  interference  channel,  the  weighted  sum-rate 
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maximization  investigated  in  [CYM06,  YL06,  CHC07]  is  given  by 
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in  which  a;  G  [0, 1]  is  a  fixed  weight.  The  dual  method  forms  the  following 
Lagrangian 
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where  Ai,  A2  >  0  are  Lagrangian  dual  variables.  The  Lagrangian  dual  function  is 
defined  as 


D{  Ai,A2) 


max  L  (P1,P2,  Ai,  A2) . 

P,,P,bO  v 


(3.17) 


Denote  the  objective  function  of  problem  (3.15)  as  /(P1,P2)  and  the  overall 

complexity  of  exhaustive  search  is  (P[nax/Ap))'f‘:).  From  optimization 

theory  [BV04],  we  know  that,  for  arbitrary  feasible  Pi,  P2,  we  have  /  (Px,  P2)  < 

-^(^15^2)  •  This  leads  to  minP(A1,A2)  >  max/(P1,P2)  ,  and  minP(Ai,A2) 

Ai,A2  P1'P2  Ai,A2 

provides  an  upper  bound  of  the  optimal  value  of  the  problem  in  (3.15).  Gen¬ 
erally  speaking,  if  /(P1,P2)  is  non-convex,  the  duality  gap  minP(A1,A2)  — 

Al,A2 


max  f  (Pi,  P2)  is  not  zero. 
PiJV  V  ’  ; 


Fig.  3.3  summarizes  the  three  key  steps  of  a  dual  method,  the  OSB  algorithm 
[CYM06,  YL06],  that  can  efficiently  find  the  global  optimum  of  the  problem  in 
(3.15).  First,  for  fixed  Ai,A2,  the  maximization  of  L  (Px,  P2,  Ai,  A2)  over  Px.P2 
in  (3.17)  is  decomposed  into  N  uncoupled  sub-problems,  and  each  of  them  cor¬ 
responds  to  a  per-bin  optimization.  Therefore,  the  overall  complexity  of  maxi¬ 
mizing  L  (Pi,  P2,  A1;  A2)  over  Pi.P2  is  only  0(K  ^(Pfi^/Ap)).  Second,  it  is 
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Time-sharing  Monotonic 

Complexity  property  property  Uncoupled  property 

OCCnarVA,))*)  i  Efficientlupdate  Complexity^  0(KT\ (p”^ /A „)) 

'  Duality  gap  =  0  0f  dual  variables  V  ' 

max / (P]_ , P2 )  ■*  mm ^2 )  "*  max L (Pj , P2 , A| , A9 )  -*  7/(P1,P2,A1,A2) 

Pi  ,P2  Pi  5P2 

Figure  3.3:  Key  steps  of  the  dual  approach  of  non-convex  weighted  sum-rate  maximization. 

shown  that,  for  fixed  n,  the  sum  power  of  user  n’s  optimal  power  allocation  in  a 
multi-carrier  system  is  a  monotonic  function  of  Xn  (Lemma  1,  in  [CYM06]).  This 
property  guarantees  that  the  bi-section  dual  update  over  Ai,A2  will  converge  to 
the  dual  optimum.  Third,  it  is  also  proven  that,  if  the  number  of  frequency 
bins  K  is  large  enough  and  {H^n}  and  {cr^}  are  smooth  in  the  spectral  domain, 
the  optimization  problem  (3.15)  satisfies  the  so-called  “time-sharing  property” 
(Theorem  1  and  2,  in  [YL06]),  and  the  duality  gap  of  this  non-convex  problem 
is  zero.  Combining  the  three  properties  together,  the  dual  approach  can  find  the 
global  optimum  with  the  computational  complexity  of  OiT^KW^ Pl’^/Ap)), 
where  T\  is  the  number  of  iterations  needed  for  dual-update.  We  can  see  that 
the  complexity  of  the  dual  approach  is  greatly  reduced  compared  with  that  of 
the  exhaustive  search  in  the  primal  domain.  In  addition,  it  is  found  in  [YL06] 
that,  if  .D(Ai,A2)  is  approximated  using  a  local  maximum  of  L  (Pi,  P2,  Ai,  A2), 
the  ISB  algorithm  can  achieve  near-optimal  performance  with  the  computational 
complexity  of  0{TiT2K  5^i(P|IlflX/Ap)),  where  T2  is  the  number  of  iterations 
required  for  evaluating  the  local  maximum. 
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Monotonic  property  waterfilling  function  g2  (Pj ,  1V2 ,  cq  .  P™8* ) 


Complexity  Duality  gap  may  not  be  0,  Efficient  update  Complexity 

0( (pmax / ) ^ ) but  tighter  than  the  single-  of  dua]  variab]es  0((pm«  /A  )K) 

'  user  water-filling  bound  1  /  '  '  ' 
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Figure  3.4:  Complexity  and  properties  of  the  dual  approach  of  computing  the  Stackelberg 
equilibrium. 

3.2.4  The  Lagrangian  Dual  Approach  for  Computing  Stackelberg  Equi¬ 
librium 


Now  we  apply  the  dual  approach  for  our  problem  in  (3.11)  to  understand  why 
the  Stackelberg  equilibrium  in  our  considered  problem  is  intrinsically  difficult  to 
compute.  Fig.  3.4  summarizes  the  key  properties  of  the  dual  approach  that  will  be 
addressed  in  the  following  parts.  Denote  the  objective  function  of  problem  (3.11) 
as  f  (Px)  .  Consider  its  dual  objective  function  D'  (/i)  for  a  fixed  Lagrangian 
dual  variable  /i: 

D'  (n)  =  maxL^P^/i) ,  (3.18) 


K 


in  which  L'  =  E  log2  1  + 


k= 1 


pfc 

rl 


N?+a*g*  (P11JVa,a1,PS 


K 


+fi  pr  -  e  pi 


k= 1 


For  a  given  fi,  denote  the  optimal  power  allocation  that  maximizes  (3.18)  as 
Px  (/i)  =  arg  max  Z7  (P1)yu)  and  P f  (/i)  =  [Px  (/i)]fc,  The  following  lemma  holds 

P 1 E0 

for  P  i(/i): 


Lemma  3.1  EfcLi  p\  (/-0  monotonic  decreasing  in  q.  In  addition,  we  have 
lim  ELi  pi  M  = 0  and  EfcLi  pi  (0)  =  +oo. 

fl— >oo 

Proof .  It  is  easy  to  see  that  EaLi  pi  (0)  =  +oo-  The  rest  of  the  proof  is  the 
same  as  in  Lemma  1  in  [CYM06]  .■ 
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Figure  3.5:  Duality  gap  for  the  problem  in  (3.11). 

Fig.  3.5  gives  a  graphical  illustration  of  the  above  Lemma.  Consider  a  se¬ 
quence  of  optimization  problems  similar  with  (3.11).  These  problems  are  pa¬ 
rameterized  by  the  constraint  imposed  over  user  l’s  maximal  sum  power.  The 
solid  curve  in  Fig.  3.5  is  a  plot  of  the  optimal  value  P\k->  f  (P*))  as  this 

constraint  varies.  The  curve  is  plotted  with  P*  ^  on  the  £-axis.  The  y- 

axis  is  located  at  the  point  where  X^=i  Pi*fc  =  P™ax  .  The  intersection  of  the 
curve  with  the  y- axis  is  the  optimum  of  (3.11),  i.e.  max//(P1).  For  a  fixed 

/i,  by  drawing  a  tangent  line  to  the  P\ki  f  (P*)j  curve  and  measuring 

the  intersection  of  this  tangent  line  with  the  y- axis,  the  value  of  D'  (fi)  can  be 
graphically  obtained.  According  to  Lemma  3.1,  as  ft  increases,  the  x-axis  value 
of  the  tangent  point  monotonically  increases.  We  denote  /T  =  arg  min  D'  (n) . 
Recall  that  Lemma  3.1  does  not  claim  the  continuity  of  Ylk=i  Pf  (h)  hi  A  It 
is  because  the  allocated  powers  in  different  frequency  bins  are  coupled  due  to 
function  g%  (Pi,  N2iaii  P™3351)  and  the  time-sharing  property  in  [YL06]  is  not 
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guaranteed  for  problem  (3.11).  The  discontinuity  may  lead  to  nonzero  duality 
gap,  i.e.  at  least  two  tangent  points  exist  on  the  tangent  line  in  Fig.  3.5  and 
they  correspond  to  different  power  constraints  P*  and  P ^ .  If  the  duality  gap 
is  positive,  the  following  theorem  indicates  that  D'  (p*)  provides  a  tighter  upper 
bound  of  the  achievable  rate  than  Rfax  in  (3.8). 

Theorem  3.2  If  the  duality  gap  is  nonzero,  i.e.  D'  (p*)  >  max/'  (Px);  the  dual 

W 

optimum  provides  a  tighter  upper  bound  of  user  1  ’s  maximal  achievable  rate  than 
the  bound  in  (8),  i.e.  D'  (p*)  < 


Proof .  As  shown  in  Fig.  3.5,  the  non-zero  duality  gap  implies  that  there  exist  at 
least  two  possible  values  for  J2k=i  (t1*)  j  which  are  denoted  as  Px  and  P^  and 
they  satisfy  P*  <  P™ax  <  P)  .  Denote  the  optimal  power  allocation  of  having 
power  constraints  Px  and  P^  as  Pjj  and  P^  respectively.  We  have 

pmax  _  P~k  J  =  f  (P+)  +  p*  f  P"laX  -  J2  Plk 
k= 1  /  V  k=  1 

(3.19) 

Moreover,  since  P^  <  Pjnax  <  P^,  there  exists  0  <  v  <  1  such  that  P"1^  = 
vP\  +  (1  —  f)Pj.  Immediately,  we  get  D'  (p*)  =  vf  (Pj")  +  (1  —  v)  f  (P+)  . 
It  corresponds  to  the  time-sharing  scenario,  in  which  the  power  allocation  P^  is 
adopted  for  time-fraction  v  and  P^  for  time-fraction  1—v.  Consider  the  problem 
of  allocating  user  l’s  power  subject  to  the  maximal  power  constraint  paiax  in  the 
interference- free  environment.  We  know  that  the  optimal  solution  is  the  single- 
user  water-filling.  Noting  that  P™ax  =  uPx  +  (1  —  i;)P^  and  Px  f  P^,  the 
aforementioned  time-sharing  strategy  is  sub-optimal  for  this  problem.  Therefore, 


o’ (t?)  =  r  (pp + 
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we  have 


K 


D'  (/!*)  =  vf  (Pr)  +  (l-v)  f  (P  +  )  <  V  log2  1  + 


l\k  I  Hk 


11 


fc=l 


cr 
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(1_V)^log2  1  + 


P  +  fc  I  TTk 

rl  \n 
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11 
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<  l0g2  1  + 


k*  |  uk  I 
11 


pk*  1^ 


fc=l 


cr? 


1  + 

nmax 

—  K1  , 

(3.20) 


and  this  concludes  the  proof. 


By  Theorem  3.2,  evaluating  the  dual  function  leads  to  a  tighter  upper  bound 
of  Stackelberg  equilibrium  than  R\n'AX.  However,  it  is  unfortunate  that  the  compu¬ 
tational  complexity  of  optimally  maximizing  L'  (P1,/u)  is  still  0((P™ax/Ap)A). 
This  is  because  term  gk  (Px,  N2,ot1, P™3*)  in  the  denominator  term  of  (3.11)  is 
also  a  function  of  the  allocated  power  Pk'  ( k '  ^  k )  ,  which  makes  it  impossible  to 
decouple  the  maximization  in  (3.18)  into  K  independent  sub-problems.  To  con¬ 
clude,  the  complexity  of  optimal  solution  in  the  dual  domain  is  the  same  as  the 
primal  approach,  which  again  highlights  the  fact  that  the  Stackelberg  equilibrium 
is  difficult  to  compute. 


3.3  Low-complexity  Algorithm,  Simulations,  and  Exten¬ 
sions 

In  this  section,  we  propose  a  low-complexity  dual  algorithm  to  search  the  Stackel¬ 
berg  equilibrium  and  examine  its  achievable  performance  via  extensive  numerical 
simulations.  We  also  discuss  how  the  strategic  users  can  obtain  the  required 
information  and  the  extensions  to  general  multi-user  scenarios. 
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Table  3.1:  Algorithm  3.1:  A  low-complexity  dual  algorithm. 


Input:  PJ18*,  P^ax,  A’f,  ,  af ,  for  \/k 
Initialize:  P,  =  P^, /zmax, /rmin 
Repeat 


/i  =  {  (/imax  +  /P 


Repeat 

for  k  =  1  to  K, 


K  f 

set  Pf  =  arg  max  ^  <  In 
pi  k= i  l 


1  + 


pfc 

■*1 


^i+aaffa(Pi.^2^1.PS 
keeping  P,1 , . . . ,  P*"1,  Pf+I, . . . ,  Pf  fixed. 


by 


end 

until  (P,1, . . . ,  PA )  converges 
if  Ef=i  >  P?iax^  /Umin  =  i  (/imax  +  else  /imax 
until  it  converges 


(/imax  +  /imin) . 


3.3.1  A  Low-Complexity  Dual  Approach 

As  we  have  shown,  the  dual  approach  cannot  reduce  the  complexity  of  the  global 
optimum  of  problem  (3.11).  However,  inspired  by  the  ISB  algorithm  [YL06], 
we  develop  an  efficient  dual  approach,  which  is  listed  as  Algorithm  3.1.  The 
basic  idea  of  the  algorithm  is  to  approximately  evaluate  D'  (/i)  by  locally  opti¬ 
mizing  II  (P1;/i).  For  fixed  /i,  the  algorithm  finds  the  optimal  P\  while  keeping 
P/ , . . . ,  Pf-1,  Pf+1, . . . ,  Pi  fixed,  and  changes  the  index  /  until  it  converges  to 
a  local  maximum  for  LI  (P  1,/i).  Then  the  algorithm  updates  /i  using  bi-section 
search  and  repeats  the  procedure  above  until  the  convergence  is  achieved. 

As  discussed  in  [YL06] ,  the  local  optimum  depends  on  the  initial  starting  point 
and  the  ordering  of  iterations.  Moreover,  the  proof  of  convergence  of  the  whole 
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Table  3.2:  User  l’s  computational  complexity  for  different  algorithms. 


Algorithm 

Computational  complexity 

Exhaustive  search 

o((prx/Ap)x) 

Algorithm  3.1 

C>(TiT2A'P™ax/Ap) 

Iterative  water-filling 

o(T3K ) 

algorithm  becomes  an  issue.  Algorithm  3.1  sets  the  Nash  equilibrium  as  the  initial 
starting  point.  In  most  of  the  experimental  setting  we  have  tested,  Algorithm  3.1 
has  been  observed  to  converge  to  a  feasible  solution  within  10-15  iterations.  The 
computational  complexity  of  this  iterative  algorithm  is  only  C>(T1T2A'P1]nax/Ap) 
and  it  reduces  the  complexity  of  the  optimal  exhaustive  search  by  a  factor  of 
(D ( (P|jnax/ Ap)^  1  ^ ( T1T2K ))  ,  which  is  considerably  large  for  small  A p  and  large 
K  .  Table  3.2  summarizes  the  computational  complexity  comparison  for  user  1 
if  it  adopts  different  algorithms,  in  which  T3  is  the  number  of  iterations  required 
in  the  iterative  water-filling  algorithm. 

3.3.2  Illustrative  Results 

I11  this  sub-section,  we  evaluate  the  performance  of  Algorithm  3.1  by  comparing 
with  the  IW  algorithm.  We  simulate  a  wireless  system  with  20  sub-carriers  over 
the  6.25-MHz  band.  We  assume  that  P™**  =  P™ax  =  200  and  a\  =  erf  =  0.01. 
To  evaluate  the  performance,  we  tested  105  sets  of  frequency-selective  fading 
channels  where  a  unique  Nash  equilibrium  exists,  which  are  simulated  using  a 
four-ray  Rayleigh  model  with  the  exponential  power  profile  and  160  ns  delay 
between  two  adjacent  rays  [Rap96].  The  simulated  power  of  each  ray  decreases 
exponentially  according  to  its  delay.  The  total  power  of  all  rays  of  Hxl  and  H22 
is  normalized  as  one,  and  that  of  HX2  and  H21  is  normalized  as  0.5. 
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IW  Algorithm 

30 1 - 1 - 1 - 1 - 1 - 1 - 


frequency  bins 


Algorithm  1 

50 1 - j - 1 - i - 1 - i - 


frequency  bins 


Figure  3.6:  User  l’s  power  allocation  using  different  algorithms. 

Fig.  3.6  and  3.7  show  the  power  allocations  for  both  users  using  different 
algorithms.  In  the  IW  algorithm,  each  user  water-fills  the  whole  frequency  band 
by  regarding  its  competitor’s  transmit  PSD  as  background  noise  until  the  Nash 
equilibrium  is  achieved.  In  contrast,  user  1  does  not  water-fill  if  it  adopts  Algo¬ 
rithm  3.1.  For  example,  in  Fig.  3.6,  user  1  allocates  a  large  amount  of  power 
in  frequency  bin  3  even  though  it  can  gain  an  immediate  increase  in  R\  by  re¬ 
allocating  some  of  its  power  in  the  frequency  bins  5  and  6  where  the  noise  plus 
interference  PSD  is  below  its  water-levels  in  the  frequency  bins  7-12. 

Denote  user  V s  achieved  rate  by  deploying  Algorithm  3.1  as  R[  .  Fig.  3.8 
shows  the  simulated  cumulative  distribution  functions  (cdf)  of  R'i/RfE.  From 
the  curve,  Algorithm  3.1  achieves  a  higher  rate  for  the  foresighted  user  in  all 
the  simulated  realizations.  The  average  rate  improvement  that  Algorithm  3.1 
provides  over  the  IW  algorithm  is  38%.  In  addition,  it  is  surprising  to  find  that, 
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IW  Algorithm 


frequency  bins 

Algorithm  1 


frequency  bins 


Figure  3.7:  User  2’s  power  allocation  using  different  algorithms. 

in  95%  of  the  simulation  settings,  Algorithm  3.1  also  results  in  a  higher  rate  R2 
than  R2E  for  the  myopic  user,  and  the  average  rate  improvement  is  45%.  This  is 
because  user  l’s  Stackelberg  strategy  mitigates  its  interference  caused  to  user  2. 

We  also  simulate  the  scenarios  in  which  the  total  power  of  H±2  and  H21  is 
normalized  as  0.25  and  all  the  other  parameters  remain  the  same  as  above.  Fig. 
3.9  shows  the  simulated  cdfs  of  R!i/RfE.  The  average  rate  improvement  for  user  1 
is  27%  and  that  of  user  2  is  32%.  It  is  intuitive  that  the  average  rate  improvement 
is  decreasing  when  the  power  of  Hy2  and  H2l  decreases,  because  the  interference 
coupling  between  users  and  the  foresighted  user’s  ability  in  shaping  the  myopic 
user’s  response  are  both  reduced. 
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1 


Figure  3.8:  Cdfs  for  the  ratio  of  R'i/RfE  (J2k  |-ffi2|2  =  J2k  |-^2i|2  =  0.5). 

3.3.3  Information  Acquisition 

Previous  sections  mentioned  that,  in  order  to  play  the  Stackelberg  equilibrium, 
the  additional  information  about  the  competing  user’s  CSI,  maximum  power 
constraint,  and  power  allocation  strategy  is  indispensable.  In  practice,  there  are 
several  possible  methods  to  acquire  this  required  information. 

First,  the  myopic  user  has  the  incentive  to  provide  the  required  information, 
because  its  performance  can  be  greatly  improved  if  the  foresighted  player  knows 
the  myopic  player’s  private  information.  In  the  distributed  setting,  users  can 
individually  decide  whether  or  not  to  play  the  Stackelberg  strategy  based  on  their 
computational  hardware  constraints.  The  user  that  wants  to  behave  myopically 
can  reveal  its  information  to  the  foresighted  user.  This  can  be  viewed  as  the 
user’s  cooperative  behavior  to  avoid  mutual  interference. 


When  no  information  exchanges  among  users  are  possible,  the  alternative 


Figure  3.9:  Cdfs  for  the  ratio  of  R'JR?E  (£fe  |fff2|2  =  \hti\2  =  0.25). 

way  for  users  to  gather  this  information  is  through  predictive  modeling.  If  the 
foresighted  user  strategically  changes  its  power  allocation,  it  can  measure  and 
model  the  resulting  interference  PSD,  i.e.  estimate  the  functional  expression 
of  g%  (Pi,  N2,(Xi,  P£iax),  without  any  information  exchange  among  users.  For 
instance,  in  the  next  chapter,  we  will  show  that  the  foresighted  user  can  effectively 
model  its  experienced  interference  as  a  linear  function  of  its  own  allocated  power, 
formulate  a  local  approximation  of  the  original  bi-level  program,  and  substantially 
improve  both  users’  achievable  rates. 

3.3.4  Extensions  to  Multi-user  Games 

The  two-user  formulation  can  be  extended  to  the  general  cases  in  which  multiple 
users  can  be  myopic  or  foresighted.  The  analysis  in  these  cases  becomes  much 
more  involved.  We  denote  the  number  of  foresighted  user  as  nj  and  the  number 
of  myopic  user  as  nm.  We  briefly  address  two  remaining  cases  as  follows. 
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In  the  first  case,  rif  =  1  ,nm  >  1.  As  in  (3.11),  we  can  still  have  the  following 
single-level  formulation: 

/  \ 
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k= 1 
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nm+ 1 
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k= 1 


in  which  Nf  =  af/\H%\2,  N  =  [Nf  :  i  —  2, . . . ,  nm  +  1,  k  —  1, . . . ,  K]  ,  a  —  { 
i  =  1, . . .  ,nm  +  1,  j  =  2, . . .  ,nm  +  1,  k  =  1, . . . ,  AT  },  and  (Pi,  AT, a,  P 
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P™i)  is  the  function  determining  user  n’s  allocated  power  in  channel  k.  As  a 
general  from  of  the  two-user  case,  problem  (3.21)  is  also  non-convex.  It  is  easy 
to  verify  that  Lemma  3.1  and  Theorem  3.2  still  hold.  Although  it  is  difficult  to 
analytically  derive  (P1,N,a,  P™ax, ....  PJj^+x),  we  are  still  able  to  numeri¬ 
cally  evaluate  it.  Hence,  Algorithm  3.1  can  be  applied  in  this  case  by  replacing 
its  lines  7  with  numerically  finding  local  maxima  of 
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We  simulate  some  three-user  scenarios  in  which  Ylk=i\^ijY  =  0-25  fori  % 
j,  P™ax  =  200,  and  erf  =  0.01,  and  all  the  other  parameters  remain  the  same  as 
Section  3.3.2.  Fig.  3.10  shows  the  simulated  cdfs  of  R'jRfE.  The  average  rate 
improvement  for  user  1  is  34%  and  that  of  user  2  and  3  is  10.5%.  From  Fig.  3.10, 
we  can  see  that,  the  Stackelberg  strategy  also  benefits  the  two  myopic  users  in 
more  than  83%  of  the  channel  realizations. 


Assume  now  that  we  have  multiple  foresighted  users,  i.e.  rij  >  1,  nm  >  1. 
In  this  case,  the  single  objective  function  in  the  original  upper-level  problem  dis¬ 
appears  and  it  becomes  a  multi-objective  optimization  problem.  Using  similar 
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Figure  3.10:  Cdfs  for  the  ratio  of  R!i/RfE  (J2k  ^ij  1 2  =0-25,  i  /  j). 

arguments  in  Theorem  3.1,  we  can  show  that  the  Nash  equilibrium  still  exists  in 
the  followers’  game.  For  these  foresighted  users,  a  reasonable  outcome  is  to  choose 
an  operating  point  in  the  set  lZnf  =  {  (f?i, . . . ,  Rnf)  :  Ri  > 

where  RfE  is  user  i’s  achievable  rate  if  all  the  users  are  myopic.  This  point  can 
be  determined  based  on  the  negotiation  among  the  foresighted  users.  Coopera¬ 
tive  game  theory  provides  many  solution  concepts,  e.g.  bargaining,  for  choosing 
the  operating  point  [FT91] .  Note  that  the  overall  game  in  this  scenario  is  a 
mixture  of  cooperation  and  competition  in  that  the  cooperation  exists  among 
the  foresighted  users  while  myopic  players  compete  with  each  other.  A  possible 
way  of  achieving  the  boundary  point  on  lZnf  is  to  let  some  coordinator  solve  the 
following  weighted  sum-rate  maximization  and  determine  the  transmitted  PSDs 
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for  different  foresighted  users: 
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(3.22) 


in  which  uy  >  0  is  user  i’s  weight.  Although  this  problem  is  generally  difficult  to 
solve  optimally,  some  low-complexity  methods  similar  to  Algorithm  3.1  can  be 
adopted  to  obtain  sub-optimal  solutions. 

3.4  Concluding  Remarks 

This  chapter  considers  the  strategic  behavior  in  determining  the  transmit  power 
PSD  for  selfish  users  sharing  a  frequency-selective  interference  channel.  We  adopt 
the  game  theoretic  concept  of  Stackelberg  equilibrium  and  model  the  two-user 
case  as  a  bi-level  programming  problem.  We  show  that  the  Stackelberg  equilib¬ 
rium  is  intrinsically  difficult  to  compute  and  propose  a  low-complexity  approach 
based  on  Lagrangian  dual  theory.  Numerical  results  show  the  strategic  user 
should  avoid  shortsighted  Nash  strategy  and  it  can  substantially  improve  both 
users’  performance  if  it  knows  the  CSI  and  response  strategy  of  the  competing 
user.  Operational  methods  for  acquiring  the  necessary  information  and  extensions 
to  multi-user  scenarios  are  proposed.  Obtaining  satisfactory  performance  with 
minimal  information  exchange  while  multiple  foresighted  users  exist  is  identified 
as  a  problem  for  further  investigation. 
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CHAPTER  4 


Conjectural  Equilibrium  in  Power  Control 

Games 

For  power  control  games,  the  previous  chapter  uses  the  Stackelberg  equilibrium 
(SE)  formulation  to  investigate  the  best  response  strategy  of  a  selfish  user  that 
knows  its  myopic  opponents’  private  information,  including  their  channel  state 
information  and  power  constraints.  It  was  shown  that,  surprisingly,  a  foresighted 
user  playing  the  SE  can  improve  both  its  performance  as  well  as  the  performance 
of  all  the  other  users.  These  results  highlight  the  significance  of  information  avail¬ 
ability  in  power  control  games.  However,  one  key  question  remains  unsolved:  how 
should  a  foresighted  user  acquire  its  desired  information  and  adapt  its  response? 

First,  as  opposed  to  the  approach  in  the  previous  chapter,  which  assumes 
a  foresighted  user  having  perfect  knowledge  of  its  competitors’  responses  to  its 
actions,  we  discuss  in  this  chapter  how  the  foresighted  user  without  any  such 
a  priori  knowledge  can  accumulate  this  knowledge  and  improve  its  performance 
when  participating  in  the  power  control  game.  We  propose  that  the  foresighted 
user  can  explicitly  model  its  competitors’  response  as  a  function  of  its  power 
allocation  by  repeatedly  interacting  with  the  environment  and  observing  the  re¬ 
sulting  interference.  Second,  we  introduce  the  concept  of  conjectural  equilibrium 
(CE)  proposed  by  Wellman  and  others  [WH98,  FJQ04]  to  characterize  the  strate¬ 
gic  behavior  of  a  user  that  models  the  response  of  its  myopic  competing  users, 
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and  the  existence  of  this  equilibrium  in  the  power  control  game  is  proven.  Some 
previously  adopted  solutions,  including  NE  and  SE,  are  shown  to  be  special  cases 
of  the  CE.  The  basic  notion  of  CE  was  first  proposed  by  Hahn  in  the  context  of  a 
market  model  [Hah77].  A  general  multi-agent  framework  is  proposed  in  [WH98] 
to  study  the  existence  of  and  the  convergence  to  CE  in  market  interactions. 
Specifically,  a  strategic  user  is  assumed  to  model  the  market  price  as  a  linear 
function  of  its  desired  demand.  It  is  observed  that  it  may  be  better  or  worse  off 
than  without  modeling,  depending  on  its  initial  belief.  However,  we  note  that 
using  the  linear  model  is  purely  heuristic  in  [WH98] .  In  contrast  to  this  heuristic 
belief  formation,  we  apply  CE  in  the  power  control  game,  because  it  provides  a 
practical  solution  concept  to  approach  the  performance  bound  of  SE.  Finally,  we 
show  that  deploying  the  linear  model  to  form  conjectures  can  suitably  explore 
the  problem  structure  of  the  power  control  game,  and  therefore,  it  can  lead  to 
a  substantial  performance  improvement.  Practical  algorithms  are  developed  to 
form  accurate  beliefs  and  select  desirable  power  allocation  strategies.  It  is  shown 
that,  a  foresighted  user  without  any  a  priori  knowledge  can  effectively  learn  how 
the  other  users  will  respond  to  its  actions  and  guide  the  system  to  an  operating 
point  having  comparable  performance  to  Algorithm  3.1,  where  perfect  a  priori 
knowledge  is  assumed. 

The  rest  of  the  chapter  is  organized  as  follows.  Section  4.1  introduces  the 
concept  of  CE  and  Section  4.2  proves  the  existence  of  this  CE  in  the  power  control 
game.  Section  4.3  develops  practical  algorithms  to  form  beliefs  and  approach  CE. 
Numerical  results  are  provided  in  Section  4.4  to  show  that  a  foresighted  user  can 
achieve  substantial  performance  improvement  if  it  models  its  competitors  in  the 
power  control  game.  Concluding  remarks  are  drawn  in  Section  4.5. 
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4.1  Conjectural  Equilibrium 


As  discussed  in  Chapter  3,  to  find  the  SE  in  the  power  control  game,  we  need 
to  solve  the  bi-level  programming  problem  in  (3.21),  where  user  1  is  assumed  to 
be  the  foresighted  user.  It  should  be  pointed  out  that  the  foresighted  user  needs 
to  know  the  private  information  {^},{ai„},{Prx}  °f  all  its  competitors  in 
order  to  formulate  the  above  optimization.  The  approach  in  the  previous  chap¬ 
ter  assumes  that  the  foresighted  user  has  the  perfect  knowledge  of  this  private 
information.  Importantly,  it  was  shown  in  Chapter  3  that  users’  performance  is 
substantially  improved  compared  with  that  of  IW  algorithm  if  the  foresighted 
user  plays  the  SE  strategy,  even  though  the  remaining  users  behave  myopically. 
However,  how  such  a  foresighted  user  should  accumulate  this  required  informa¬ 
tion  remains  unsolved.  In  the  remaining  part  of  this  chapter,  we  will  show  that 
the  foresighted  user  can  obtain  this  information  and  improve  its  performance 
by  forming  conjectures  over  the  behavior  of  its  competitors  through  repeated 
interaction  with  the  environment. 

In  a  game-theoretic  setting,  which  equilibria  will  be  played  is  determined 
based  on  the  existing  assumptions  about  the  players’  knowledge  and  beliefs.  For 
example,  the  standard  NE  solution  is  a  set  of  strategies  where  no  player  has  a 
unilateral  incentive  to  change  its  strategy.  An  implicit  underlying  assumption 
is  that  each  Nash  player  takes  the  other  players’  actions  as  given.  Therefore,  it 
chooses  to  myopically  maximize  its  own  payoff  [FT91] .  Another  example  is  that 
of  a  SE  strategy,  where  the  foresighted  user  needs  to  know  the  structure  of  the 
resulting  NE(an )  for  any  an  G  An  and  believes  that  all  the  remaining  players 
play  the  NE  strategy.  Summarizing,  the  players  operating  at  equilibrium  can  be 
viewed  as  decision  makers  behaving  optimally  with  respect  to  their  beliefs  about 
the  policies  adopted  by  the  other  players. 
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To  rigorously  define  the  CE,  we  need  to  use  the  reformulation  of  the  strategic 
game  T  =  (Af,  A,  u,  S,  s)  defined  in  (1.4).  S  =  xn£_\fSn  is  the  state  space,  where 
Sn  is  the  part  of  the  state  relevant  to  the  nth  user.  Specifically,  the  state  in 
the  power  control  game  is  defined  as  the  interference  that  users  experience.  The 
utility  function  u  =  xn£_\fun  is  a  map  from  users’  state  space  and  actions  to  real 
numbers,  un  :  Sn  x  An  —>  1Z.  The  state  determination  function  s  =  xn£_\fsn  maps 
joint  actions  to  states  for  each  component  sn  :  A  — >  Sn .  Each  user  cannot  directly 
observe  the  actions  chosen  by  the  others,  and  each  user  has  some  belief  about  the 
state  that  would  result  from  performing  its  available  actions.  The  belief  function 
s  =  xneMsn  is  defined  to  be  sn  :  An  — >  Sn  such  that  sn  (on)  represents  the  state 
that  the  player  n  believes  that  would  result  if  it  selects  action  an.  Notice  that 
the  beliefs  are  not  expressed  in  terms  of  other  player’s  actions  and  preferences, 
and  the  multi-user  coupling  in  these  beliefs  is  captured  directly  by  individual 
users  forming  conjectures  of  the  effects  of  their  own  actions.  In  non-cooperative 
scenarios,  each  user  chooses  the  action  an  G  An  if  it  believes  this  action  maximizes 
its  utility. 

Definition  4.1  In  game  T,  a  configuration  of  belief  functions  (§l,...,s*N)  and 
a  joint  action  a*  =  (a*,...,a*N)  constitute  a  conjectural  equilibrium,  if  for  each 
neAf, 


4  (a*)  =  sn(ai,  ■■■An)  and  an  =  arS  max  un  ( s*n  (an),  an) .  (4.1) 

anEAn 

From  the  definition,  we  can  see  that,  at  CE,  all  users’  expectations  based 
on  their  beliefs  are  realized  and  each  user  behaves  optimally  according  to  its 
expectation.  In  other  words,  users’  beliefs  are  consistent  with  the  outcome  of  the 
play  and  they  behave  optimally  with  respect  to  their  beliefs.  CE  considers  the 
users’  beliefs  rather  than  their  perfect  knowledge  NE  (an)  as  in  SE,  which  makes 
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CE  an  appropriate  solution  concept  when  the  perfect  knowledge  is  not  available. 
The  key  problem  is  how  to  configure  the  belief  functions  such  that  it  leads  to  a 
CE  having  a  satisfactory  performance. 

4.2  Existence  of  CE  in  Power  Control  Games 

In  this  section,  we  discuss  how  to  configure  a  user’s  belief  about  its  experienced 
interference  as  a  linear  function  of  its  transmit  power,  and  show  that  such  CE 
exists  and  it  is  a  relaxation  of  both  NE  and  SE.  We  begin  by  stating  several 
fundamental  assumptions  used  throughout  the  investigation  hereafter. 

Assumption  4.1  There  is  only  one  foresighted  user  modeling  its  competitors’ 
reaction  as  a  function  of  its  allocated  power,  and  all  the  remaining  users  are 
myopic  users  that  deploy  the  IW  algorithm.  Without  loss  of  generality,  we  assume 
that  this  foresighted  user  is  user  1. 

Assumption  4.2  Every  user  is  able  to  perfectly  measure  its  experienced  equiva¬ 
lent  noise  PSD  a*  and  interference  PSD  in  all  frequency  channels. 

Assumption  4.3  Users  2, . . . ,  N  react  to  any  small  variation  in  their  experi¬ 
enced  interference  by  setting  their  power  allocations  according  to  the  water-filling 
strategy. 

Assumption  4.4  In  the  lower-level  problem  formed  by  user  in  (3.21),  there  al¬ 
ways  exists  a  unique  NE  and  the  IW  algorithm  converges  to  this  unique  NE.  A 
sufficient  condition  that  guarantee  the  uniqueness  of  NE  is  condition  (C3). 

Next,  we  formally  define  the  concept  of  stationary  interference. 
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Definition  4.2  The  stationary  interference  that  user  1  experiences  in  the  kth 
channel  is  the  accumulated  interference  if  =  2  an  when  best-response 

users  2,...,N  reach  their  NE  in  the  lower-level  problem  in  (3.21).  Note  that 
if  is  in  fact  a  function  of  user  1  ’s  power  allocation  P,  =  [P(, . . . ,  PA]  in  the 
power  control  game  and  it  can  also  be  denoted  as  /f  (Pi). 

4.2.1  Linear  Belief  of  Stationary  Interference 

As  discussed  before,  both  the  state  space  and  belief  functions  need  to  be  defined 
in  order  to  investigate  the  existence  of  CE.  In  the  market  models  for  pure  ex¬ 
change  economy  [WH98],  the  market  price  is  impacted  by  the  other  consumers’ 
announced  demand.  Therefore,  it  is  natural  to  define  the  state  to  be  the  market 
price  in  such  scenarios.  However,  the  proposed  approach  in  [WH98]  that  mod¬ 
els  and  updates  the  belief  on  the  market  price  as  a  linear  function  of  the  excess 
demand  is  entirely  heuristic.  This  is  not  the  case  in  our  setting,  where  forming 
linear  conjectures  fits  the  natural  structure  of  the  considered  interference  game. 

In  the  power  control  game,  we  define  state  Sn  to  be  the  stationary  interference 
caused  to  user  n,  because  besides  its  own  power  allocation,  its  utility  only  de¬ 
pends  on  the  interference  that  its  competitors  cause  to  it.  Notice  that  the  action 
available  to  user  n  is  to  choose  the  transmitted  power  allocations  subjected  to 
its  maximum  power  constraint.  By  the  definition  of  belief  function,  we  need  to 
express  the  stationary  interference  as  a  function  of  the  transmitted  power.  As 
we  will  see  later,  it  is  natural  to  deploy  linear  belief  models  due  to  the  linearity 
of  the  caused  stationary  interference  in  terms  of  the  allocated  power,  and  hence, 
forming  such  beliefs  can  lead  to  significant  performance  improvements  because 
they  capture  the  inherent  characteristics  of  the  actual  interference  coupling. 

Define  P*+,  P*"  as  P^+  =  [P(, ....  If  ~  • . Pf] ,  P*“  =  [Pf,  ...,P?~  e, 
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. . . ,  PA]  for  arbitrarily  small  positive  variation  e  in  power.  Given  user  l’s  power 
allocation  Pi,  NEk  (Pi)  =  [Pf, . . . ,  P^]T  represents  the  power  that  user  2, . . . ,  N 
allocate  in  the  kth  channel  at  equilibrium.  Vector  ak  =  { :  i  j }  contains 
channel  gains  in  the  kth  frequency  bin.  Indicator  function  y  =  sign  ( x )  is  a 
mapping  of  1ZN~1  — >  {0,  l}^- 1 ,  which  is  defined  to  be:  yi  =  1,  if  X{  >  0,  and 
yi  =  0,  otherwise.  Based  on  these  notations,  the  following  theorem  motivates  us 
to  develop  linear  belief  functions  of  stationary  interference. 


Theorem  4.1  If  the  number  of  frequency  bins  K  is  sufficiently  large,  the  first 
derivative  of  the  stationary  interference  that  user  1  experiences  in  the  kth  channel 
with  respect  to  its  allocated  power  in  the  mth  channel  satisfies 
dlk 

! Y  =  c  ( ak ,  sign  (NEk( Pi)))  ,  if  there  does  not  exist  n  G  {2, . . . ,  N} 
oPf 

satisfying  Pk  =  0  and  =  0; 


=  c{ak,sign  (NEk  (P*+)))  , 

pk^pk+ 

=  c  (ak ,  sign  ( NEk  (P^-)))  ,  otherwise ; 

=  0,  if  m  k, 

in  which  Xk  (n  G  {2, . . . ,  N })  is  the  Lagrange  multiplier  of  Pf  >  0  at  the  optimum 
of  lower-level  problem  in  (3.21).  The  function  sign(-)  is  the  indicator  of  which 
polyhedron  the  piece-wise  affine  water-filling  function  [SLS07]  lies  in.  c  ( ak ,  y) 
represents  a  constant  determined  by  ak  and  the  non-zero  elements  of  y. 

.  Qjk  ^  .  Qpk 

Proof:  By  the  definition  of  If,  we  have  =  2J  cqigpb;-  We  differentiate  two 

1  i= 2  1 

different  cases: 


dlk 

dPf 

d  If 
dPf 

dlk 
dP ™ 


1)  If  there  does  not  exist  any  n  G  {2, . . . ,  N}  satisfying  Pk  =  0  and  Xk  =  0, 
i.e.  there  is  a  non-zero  gap  between  the  interference  that  users  2, . . . ,  N  experi- 
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ence  and  their  water-levels,  it  is  straightforward  to  see  that  sign(NEk  (Pi))  = 
sign  ( NEk  (P*+))  =  sign  ( NEk  (P*“)). 

Without  loss  of  generality,  we  temporarily  assume  that  Pf  >  0  for  n  G 
{2, . . . ,  N}.  When  users  2, ...  ,N  reach  the  equilibrium,  we  have  from  the  opti¬ 
mality  conditions  of  water-filling  solution: 

(/+  G)  •  NEk  (P,)  +  cfPk  =  i/, 

in  which 


0 

a32 

a42 

1 

CN 

G 

U 

r  -] 

a23 

0 

«43 

aN3 

a12 

U 

^2 

a24 

a34 

,9k  = 

a13 

,v  = 

^3 

0  aN,N-l 

h 

alN 

UN 

a2N 

a3N 

aN-l,N  0 

Ui(i  =  2, . . . ,  N)  are  the  water-levels  of  all  the  myopic  users.  We  consider  the 
channel  realizations  satisfying  ||  G ||2  <  1,  which  guarantees  that  the  power  control 
game  has  a  unique  NE  (Theorem  8  in  [SLS07]).  Therefore,  I  +  G  is  invertible. 
It  then  follows  that 

NEk  (PJ  =  (/+  Gy1  V  -  (/+  Gy1  gkP f  (4.2) 

We  also  have  lim  =  0  ,  because  if  the  number  of  each  frequency  bin  K  is 

K— >oo  "h 

sufficiently  large,  the  fluctuation  of  the  water-level  is  negligible.  As  a  result,  we 
have 

dl\  _  3hk  •  NEk  (Pi)  _  \  ~hk  ( I  +  Gy1  if  .  if  m  —  k 

f)  JDm  f)  JDm  |  ,  V  •  / 

OI  i  urt  y  0,  otherwise 

in  which  hk  =  \ak i  ct|i  ■  ■  ■  a^i]  .  Note  that  if  Pk  =  0  and  Xk  >  0,  all  the 
derivations  above  still  apply  by  removing  the  nth  column  and  nth  row  from 


G,  NEk  (P-J  ,  gk ,  v  correspondingly.  Hence,  we  can  conclude  is  a  constant 
c  (ak,  sign  (NEk  (P1)))  that  depends  on  both  ak  and  the  non-zero  elements  of 
NEk(  Pr). 

2)  If  there  exists  n  G  {2, . . . ,  IV}  satisfying  Pk  =  0  and  Xk  =  0,  the  station¬ 
ary  interference  caused  to  user  n  is  the  same  as  its  water-level  vn.  Therefore,  a 
sufficiently  small  increment  or  decrement  £  in  user  l’s  allocated  power  Pk  may 
cause  sign  ( NEk  (P}+))  and  sign  ( NEk  (P}_))  to  be  different,  i.e.  the  station¬ 
ary  interference  NEk  (PJ  lies  on  the  boundary  between  two  polyhedra  that  have 
different  piece-wise  affine  water-filling  functions  [SLS07].  We  need  to  treat  the 
left-sided  and  the  right-sided  first  derivatives  respectively,  and  similar  conclusions 
can  be  derived  in  the  same  way  as  in  the  first  part.  ■ 

Theorem  4.1  indicates  that,  the  first  derivative  with  respect  to  a  foresighted 
user’s  allocated  power  in  a  certain  channel  is  sufficient  to  capture  how  the  sta¬ 
tionary  interference  varies  locally  in  that  channel.  We  observe  from  equality  (4.2) 
that 

if  =  hk  ■  NEk  (Px)  =  hk  (/+  Gy1  v  -  hk  (/+  G)~l  gkP f. 

Therefore,  user  1  can  define  its  belief  function  using  the  linear  form  1  Ik  =  (3k  — 
7 kPk,  in  which  is  the  estimate  of  —  and  (3k  is  a  constant  representing  the 
composite  effect  of  user  2, . . . ,  iV’s  water-levels  v.  This  linear  characterization  of 
the  stationary  interference  can  greatly  simplify  the  implicit  functional  expression 
Ik  (P1)  given  by  the  solution  of  the  lower-level  problem  (3.21),  while  maintaining 
an  accurate  model  of  Ik  (P, )  around  the  feasible  operating  point  Px. 

1Note  that  as  long  as  the  channel  realization  is  random,  for  a  fixed  K,  the  probability  that 
the  left-sided  and  right-sided  derivatives  in  Theorem  4.1  are  not  equal  is  zero.  We  will  assume 
that  the  first  derivative  exists  hereafter.  If  it  does  not  exist,  similar  results  can  be  derived  by 
treating  the  left-sided  and  right-sided  first  derivatives  separately. 
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Table  4 

.1:  Comparison  among  NE,  SE,  and  CE 

in  power  control  games. 

User  1 

User  2, . . . ,  N 

Nash 

Equilibrium 

K 

maxpneAi  ]U  fog2 
k= 1 

(1  +  vn+ia) 

Stackelberg 

(  pk  \ 

Equilibrium 

S  hi  v1  +  CTife+hUPi)J 

S,L,og2  v l  +  ^J 

Conjectural 

K 

maxPneAi  £  log2 

k= 1 

(l+ 

*k+ihJ 

Equilibrium 

N 

/f  =  /?*  -  7*pf  =  /f  =  E 

i= 2 

N 

jk  _  jk  _  y-  ak  pk 
n  n  /  j  zn  z 

i=lji^n 

4.2.2  Existence  of  Conjectural  Equilibrium 

Under  the  same  known  sufficient  conditions  discussed  in  [SLS07,  SPB08]  and 
Chapter  3  for  guaranteeing  the  existence  of  NE  and  SE,  the  existence  of  CE  can 
be  proven  by  showing  that  the  first  two  types  of  equilibrium  are  special  cases  of 
CE.  To  this  end,  Table  4.1  compares  the  optimality  conditions  of  the  three  types 
of  equilibria  in  the  power  control  game. 

As  shown  in  Table  4.1,  the  information  requirement  for  playing  various  equi¬ 
libria  differs.  At  NE,  each  user  includes  its  stationary  interference  Ik  as  a  constant 
in  the  optimization,  and  its  action  is  the  best  response  to  Ik.  To  play  SE,  the  fore- 
sighted  user  needs  to  know  the  functional  expression  of  the  stationary  interference 
1 1  (Pj)  such  that  the  bi-level  program  can  be  formed.  Specifically,  the  required 
information  includes  both  the  system- wide  channel  state  information  ak,  the  noise 
PSD  ak0  and  the  individual  power  constraint  p™ax  for  Wk  e  {1, . . . ,  K}  ,  n  e  A/”. 
In  contrast,  in  the  case  of  CE,  the  above  information  for  playing  SE  is  no  longer 
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required  and  the  foresighted  user  behaves  optimally  with  respect  to  its  beliefs  if 
on  how  the  stationary  interference  changes  as  a  function  of  Pf. 

Theorem  4.2  In  the  power  control  game,  both  the  Nash  equilibrium  and  the 
Stackelberg  equilibrium  are  special  cases  of  conjectural  equilibrium. 


Proof:  To  solve  the  CE,  the  optimization  solving  CE  in  Table  4.1  is  essentially 

'  /  r>k 

max  £  log2  (  1  + 


Pi 

yk 


k=  1 


Pk 


erf  +  (3k  —  7  kPk 


(4.4) 


s.t.  Pk  >  0  ,  / 3k  -  7 kPk  >  0  and  -x  P i  -  P? 


In  order  to  show  that  both  NE  and  SE  are  special  cases  of  CE,  we  only  need 
to  verify  that  at  NE  and  SE,  user  l’s  action  is  optimal  with  respect  to  its  belief 
and  its  belief  agrees  with  its  state.  First,  clearly,  NE  is  a  trivial  CE  with  the 

N 

parameters  0k  =  fjj  off  Pf ,  yfc  =  0  in  user  l’s  belief  functions.  Next,  denote 

4=2 

P,Si?  =  [PsE,  ■  ■  ■  ,  PcfE\  the  optimal  solution  of  the  discretized  version  of  problem 

(3.21).  To  prove  SE  is  a  CE,  we  need  to  find  the  corresponding  /3k  and  and 

show  that  SE  also  solves  problem  (4.4).  Consider  the  belief  function  in  Table 

4.1  with  the  parameters  /3k  =  (if  —  Pf  ■  and 

V  dpi )  p1=psb  dpi  p1=psb 

As  discussed  before,  such  parameters  preserve  all  the  local  information  of  the 
objective  of  problem  (3.21)  around  Pse  into  problem  (4.4).  KKT  conditions 
hold  at  P5E  since  it  solves  problem  (3.21).  A  sufficient  condition  that  ensures 
SE  to  be  a  CE  is  that  problem  (4.4)  belongs  to  convex  optimization,  because 
KKT  conditions  are  necessary  and  sufficient  for  convex  programming  to  attain 
its  optimum.  Appendix  H  provides  a  sufficient  condition  under  which  problem 
(4.4)  is  convex,  thereby  proving  that  SE  is  a  special  CE  if  these  conditions  are 
satisfied.  ■ 

Theorem  4.2  indicates  that  the  two  isolating  points,  NE  and  SE,  are  both 
CE,  if  parameters  0  =  {/3fc}^=1,  7  =  are  properly  chosen.  Therefore, 
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CE  can  be  viewed  as  an  operational  approach  to  attain  the  SE  if  the  system-wide 
information  required  for  solving  SE  is  not  available.  It  is  because  only  the  local 
information  including  stationary  interference  Ik  and  its  first  derivative  is 
required  to  formulate  problem  (4.4),  and  this  information  can  be  obtained  using 
measurements  performed  at  the  receiver. 

In  addition,  we  are  interested  in  the  existence  of  other  CEs  besides  these  two 
points.  Denote  the  parameters  of  any  CE,  e.g.  NE  or  SE,  as  /?*  = 

7*  =  {7*}^,  and  the  optimal  solution  of  problem  (4.4)  given  parameters  /?, 7 
as  P^/3,7).  Let  F  :  1ZK  x  1ZK  — ■>  1ZK  be  a  mapping  defined  as  F(/3, 7)  = 
{Fk  (A7))L  in  which 

Fk  09,7)  =  hk  ■  NEk  (P,  09,7))  -  (3k  -  7fc P ?  09,7)  (4-5) 

The  following  theorem  gives  a  sufficient  condition  which  ensures  that  infinite  CEs 
exist. 

Theorem  4.3  Let  T  be  a  power  control  game  that  satisfies  condition  (C9).  Sup¬ 
pose  that  all  the  users  form  conjectures  according  to  Table  4.1.  If  there  exist 
open  neighborhoods  A  C  IZK  and  B  C  TZK  of  /?*  and  7*  respectively,  such  that 
F  (-,  7)  :  A  — >  1ZK  is  locally  one-to-one  for  any  7  G  B  ,  then  T  admits  an  infinite 
set  of  conjectural  equilibria. 

Proof :  See  Appendix  I. 

In  summary,  Theorem  4.1,  4.2,  and  4.3  characterize  the  existence  and  struc¬ 
ture  of  conjectural  equilibrium  in  power  control  games.  As  shown  in  Fig.  4.1,  NE 
and  SE  can  be  both  special  cases  of  CE.  Open  sets  of  CE  that  contain  NE  and 
SE  may  exist  in  the  j3  —  7  plane  and  different  conjectural  equilibria  correspond  to 
different  values  of  (3  and  7.  SE  attains  the  maximal  data  rate  that  a  foresighted 
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R\ :  Rate  of  the 
foresighted  user 


Figure  4.1:  Structure  of  conjectural  equilibria  in  power  control  games. 

user  can  achieve.  According  to  theorem  4.3,  if  the  foresighted  user  properly  sets 
up  its  parameters  (3. 7,  the  solution  of  CE  in  problem  (4.4)  coincides  with  the 
solution  of  SE  in  problem  (3.21).  More  importantly,  as  opposed  to  the  SE  in 
which  the  knowledge  of  the  system-wide  private  information  is  required,  CE  as¬ 
sumes  that  the  foresighted  user  knows  only  its  stationary  interference  and  the 
first  derivatives  with  respect  to  the  allocated  power,  which  greatly  simplifies  the 
information  acquisition.  Therefore,  in  order  to  approach  the  performance  upper 
bound  given  by  SE,  this  chapter  adopts  the  approach  of  CE.  The  next  Section 
will  develop  practical  conjecture  forming  and  updating  algorithms  to  select  out 
of  the  infinite  CEs  a  desirable  power  allocation  scheme  that  provides  comparable 
achievable  rates  with  SE. 

4.3  Conjecture-based  Rate  Maximization 

Since  Theorem  3  shows  that  infinite  CEs  may  exist  and  SE  is  the  most  desir¬ 
able  CE  for  a  foresighted  user,  the  parameters  /3k,ryk  of  belief  functions  should 


103 


be  wisely  chosen  in  order  to  attain  SE  as  a  CE.  Moreover,  the  one-shot  game 
formulation  and  declarative  conclusions  in  the  previous  sections  provide  no  hint 
on  how  to  approach  the  CE.  In  practice,  it  is  also  important  to  construct  algo¬ 
rithmic  mechanisms  to  attain  the  desirable  CE.  To  arrive  at  a  CE,  a  multi-agent 
learning  approach  is  proposed  for  the  repeated  game  setting  [WH98].  Let  ,snj, 
and  any  denote  user  n’s  belief  and  action  at  time  t.  In  the  framework,  at  time  t, 
the  users  update  their  beliefs  sn)t  and  select  their  actions  an>t  based  on  their  past 
observations.  If  we  define  learning  as  the  players’  dynamic  process  of  forming 
conjectures  about  the  effects  of  their  actions,  CE  captures  the  achieved  outcome 
when  consistency  of  conjectures  within  and  across  players  emerges. 

Similarly,  this  section  proposes  that  users  can  update  their  beliefs  in  the  re¬ 
peated  interaction  setting  and  numerically  examines  their  performance.  Before 
going  into  the  technical  details,  it  should  be  pointed  out  that  the  pursuit  of  the 
practical  solution’s  convergence  to  CE  is  not  the  principal  goal  of  our  investi¬ 
gation.  Instead,  computing  power  allocation  strategies  that  require  only  local 
information  and  achieve  comparable  rates  with  SE  (which  requires  global  infor¬ 
mation)  is  the  ultimate  objective  rather  than  the  convergence.  In  other  words, 
any  power  allocation  strategy  that  lies  outside  the  open  CE  set  in  Fig.  4.1  is 
favorable  if  it  can  improve  the  performance  compared  with  NE. 

Table  4.2  summarizes  the  dynamic  updates  of  all  users’  states,  belief  func¬ 
tions,  and  optimal  actions  in  the  power  control  game.  Specifically,  at  iteration 
t,  users’  states  Init  are  determined  by  their  opponents’  power  allocation.  User  1 
updates  the  parameters  /3^,  7tfc  in  its  belief  functions  based  on  its  state  /fy  and 
allocated  power  Pfy,  and  it  also  updates  its  power  allocation  Pi,t+i  based  on 
current  operating  points  Pij4  and  its  belief  I\)t.  At  the  same  time,  myopic  users 
2, . . . ,  N  set  their  beliefs  equal  to  their  experienced  interference  and  update  their 
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Table  4.2:  Dynamic  updates  of  the  play. 


User  1 

User  2, . . . ,  N 

State  In,t 

rk  _  spN  b  pk 

1n,t  zUi=l,i^n 

Belief  function 

Sn  •  ^4 .n  *  Sn 

Pt  >  It  <-  Update,  (/*„/*) 

it, = 7  -  7 

Tk  _  jk  _  'SpN  h  pk 

n,t  n,t  in *  i,t 

Action 

Ol,t)  •  •  •  ,  UN,t 

PM+1  <-  Updates  (Pp.ti) 

max  V  log2  (l  +  kP}fk  ) 

power  allocation  based  on  the  water-filling  strategy.  Note  that  Table  4.2  implic¬ 
itly  assumes  that  user  1  will  update  after  user  2, . . . ,  iV’s  IW  algorithms  converge 
such  that  user  2, . . . ,  iV’s  power  allocations  P„it  at  time  t  can  be  regarded  as  an 
equilibrium  state.  The  outcome  of  this  dynamic  play  is  a  CE  if  lim  Pn  t  exists 

t— >oo  ’ 

and  lim  Int  =  lim  Int.  As  discussed  in  the  proof  of  Theorem  4.2,  it  is  equivalent 

t— KX)  ’  t— KX  ’ 

to  check  the  convergence  of  user  l’s  updates.  We  can  see  from  Table  4.2  that  user 
1  needs  to  complete  two  updates  at  each  iteration.  The  entire  procedure  in  Table 
4.2  that  enables  the  foresighted  user  to  build  beliefs  and  improve  its  performance 
is  named  “ Conjecture-based  Rate  Maximization” .  Appropriate  rules  for  updating 
beliefs  are  discussed  as  follows. 

Updatex :  $=,7tfe 

Note  that,  we  have  If  =  hk  (/  +  G)-1  v  —  hk  (/  +  G)_1  gkPk  from  Theorem 
4.1,  user  l’s  belief  function  takes  the  form  of  Ik  =  /3k  —  ^kPk,  and  it  satisfies 
Ik  =  Ik  at  CE  for  any  k  6  {1  .  As  discussed  in  the  previous  sec- 

tion,  by  setting  the  parameters  (3  =  I f  —  Pk  ■  and  7  =  —gjk,  we  can 

preserve  all  the  local  information  of  the  original  SE  problem  (3.21)  around  cur¬ 
rent  feasible  operating  point  P1)t.  Therefore,  we  can  update  /3k  and  7^  using 
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ftk  _  rk  _  pk  .  gfi 
Pt  ~ '  -rl  flpfe  ,  „  „ 

V  /  Pi=Pi,t 

1  can  approximate  the  parameters  using 


and  7f  = 


Pi=Pi,, 


By  Assumption  4.3,  user 


a/f  ^  ij  ({pf  +  £}  u py)  -  jf  ({jy  -  q  u py) 

dPf  2e 

for  small  £  in  which  Pfk  =  {Pf, . . . ,  P*"1,  Pf+1, . . . ,  Pf  }. 

After  Updatei  in  each  iteration,  user  1  needs  to  solve  problem  (4.4).  If 
Theorem  4.2’s  assumption  is  not  satisfied,  problem  (4.4)  belongs  to  the  class  of 
non-convex  optimization,  which  is  generally  hard  to  solve  and  standard  optimiza¬ 
tion  algorithms  can  only  be  used  to  determine  local  maxima  [BV04],  However,  in 
this  application,  we  are  able  to  show  that,  as  long  as  the  number  of  frequency  bins 
K  is  sufficiently  large,  problem  (4.4)  satisfies  the  time-sharing  condition  [YL06], 
and  its  global  optimum  can  be  efficiently  computed. 


Definition  4.3  Consider  an  optimization  problem  with  the  general  form: 

K  K 

max  E  Ok  (-X-fc)j  s.t.  ^  ( Cfc  (xfc)  E  Pj  (4.6) 

k= 1  fc=l 

where  Ok  (x^)  are  objective  functions  that  are  not  necessarily  concave,  c k  ( x*, ) 
are  constraint  functions  that  are  not  necessarily  convex.  Power  constraints  are 
denoted  by  P.  Let  x*k  and  y *k  be  optimal  solutions  to  the  optimization  problem 
(4-6)  with  P  =  Px  and  P  =  Py;  respectively.  An  optimization  problem  of  the 
form  (4-6)  is  said  to  satisfy  the  time-sharing  condition  [YL06]  if  for  any  0  < 

K 

v  <  1,  there  always  exists  a  feasible  solution  zk,  such  that  Y  ck  ( zk )  <  uPx  + 

k= 1 

(i  -  v)  Py  and  Y  ok  (zk)  <  v  Y  °k  (K)  +  (1  ~  v )  E  °k  (yl)- 

k= 1  k= 1  k=  1 


Theorem  4.4  As  the  total  number  of  sub-carriers  K  goes  to  infinity,  problem 
(4-4)  satisfies  the  time-sharing  condition. 
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Table  4.3:  Algorithm  4.3:  A  dual  method  that  solves  problem  (4.4). 


Input:  {erf}  ,  {$}  ,  {7tfc}  . 
Initialize:  ?7min,  r]max,  rj0  =  (r/min  + 
Repeat 

set  Pi  =  [Pi  . . .  Pf]  where 
Pk  =  arg  max  log2  ( 

P'p&dom  fak  pknk  X 


»?max)  /2,  Z  =  0 


1  + 


«rf+/af -TfPT 


if  £fc  Pf  <  P?1 


1Jn 


=  r{ h;  else  r/min  =  7^. 


Vi+l  (*7min  +  *7max)  /2,  *  =  1  +  1 
until  r]i  converges 


Proof  :  Specifically,  for  problem  (4.4),  0&  (x*,)  =  /CT  ^2  jffk  ryk  (Pf),  cfc(xfc)  =  Pf, 
P  =  P™ax.  First,  as  the  total  number  of  sub-carriers  K  goes  to  infinity,  con¬ 
sider  of  and  akn  as  an  (k)  and  ajn  (. k )  that  are  continuous  functions  of  k.  By 
rule  Updatei,  (3t  (k)  and  (k)  are  piece-wise  continuous,  because  Theorem  4.3 
proves  that  P,  (/3, 7)  and  F  ((3, 7)  are  continuous  in  (/3, 7).  With  the  piece-wise 
continuity  of  f3  (k) ,  7  (fc),  it  is  easy  to  check  that  the  time-sharing  condition  holds 
by  following  the  proof  of  Theorem  2  in  [YL06] .  ■ 


Update2:  Pgt+i 

It  is  shown  in  [YL06]  that,  if  the  optimization  problem  satisfies  the  time¬ 
sharing  property,  then  it  has  a  zero  duality  gap,  which  leads  to  efficient  numerical 
algorithms  that  solve  the  non-convex  problem  in  the  dual  domain.  Consider 


K 


the  dual  objective  function  d{rj)  =  Y  i  ma xfak/3k-yk  (-Pf)  —  ijP f  \  +  r)P™ay:. 

k= 1  l  Cfc  ”  ’  J 

Since  d  (77)  is  convex,  a  bisection  or  gradient-type  search  over  the  Lagrangian 
dual  variable  77  is  guaranteed  to  converge  to  the  global  optimum.  Specifically, 
Algorithm  4.3  summarizes  such  a  dual  method  that  solves  non-convex  problem 
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Table  4.4:  Conjecture-based  rate  maximization. 


Initialize:  t  —  0,  Px  0  =  P 
Repeat 

I-  fjtr/t  <-  Update!  (llvPijt) 

II.  Pl  t+1  <—  Update2  ^Plt,  Jlt^,  which  includes 

1)  Consider  problem 

max  f„k  gkr/k  (Pf),  s.t.  Pf  G  dom  fck  ak^k  and  Y  pi  <PT™- 

fc=l  k= 1 

2)  Use  Algorithm  4.3  to  calculate  the  global  optimum  P^  of  the  above 
problem. 

3)  Search  in  the  interval  of  vP1)<:  +  (1  —  v)  Pj  (0  <  v  <  1)  and  find  in 
the  interval  the  power  allocation  P^  that  maximizes  user  l’s  actual 
achievable  rate  R\. 

4)  P i,t+i  Pf,  t  —  t  +  1. 

until  no  improvement  can  be  made. 


(4.4)  using  bisection  update.  As  long  as  the  time-sharing  condition  is  satisfied, 
Algorithm  4.3  converges  to  the  global  optimum.  Hence,  we  can  always  solve 
problem  (4.4)  regardless  of  its  convexity. 

Table  4.4  summarizes  the  procedure  of  algorithm  ’’Conjecture-based  Rate 
Maximization”  (CRM).  Next,  we  make  several  remarks  about  this  algorithm. 
First,  since  we  want  to  achieve  better  performance  than  NE,  the  initial  operating 
point  Px  o  is  set  to  be  the  power  allocation  strategy  Pf E  that  user  1  will  choose 
if  it  adopts  the  IW  algorithm.  Second,  in  Update2,  the  global  optimum  Pj  is 
not  directly  used  to  update  Pf’i+ 1.  As  shown  in  Fig.  4.2,  this  is  because  problem 
(4.4)  is  only  a  local  approximation  at  Px  t  of  the  original  SE  problem  (3.21)  that 
we  want  to  solve.  Using  Pj  to  update  Pi^+i  may  decrease  the  actual  achievable 


108 


problem  (4.4) 


DC 


\ 


Figure  4.2:  Mismatch  between  problem  (3.21)  and  (4.4). 


rate  Ri,  if  a  mismatch  between  problem  (3.21)  and  (4.4)  exists  for  the  solution 
P(.  Therefore,  Update2  adopts  line  search  and  uses  the  transmit  PSD  that 
lies  in  this  interval  and  maximizes  the  actual  achievable  rate  to  update  P1)t+1. 
Therefore,  it  is  guaranteed  that  the  achievable  rate  will  not  decrease  after  each 
iteration.  Last,  CRM  stops  in  limited  iterations,  but  it  is  not  guaranteed  to 
converge  to  a  CE.  It  is  because  the  first  step  in  Update2  may  give  P  |  t  ^  P) 
but  the  line  search  returns  Pgt+i  =  Pp*.  However,  if  Pi]t  =  PJ,  CRM  converges 
to  P'f  and  the  resulting  outcome  is  a  CE. 

4.4  Simulation  Results 

This  section  compares  the  performance  of  CRM  with  the  IW  algorithm  and  Algo¬ 
rithm  3.1  that  searches  SE  assuming  perfect  knowledge  of  its  opponent’s  private 
information.  We  simulate  a  system  with  50  sub-carriers  over  the  15-MHz  band. 
We  consider  frequency-selective  channels  using  a  four-ray  Rayleigh  model  with 
the  exponential  power  profile  and  60  ns  root  mean  square  delay  spread.  The 
power  of  each  ray  is  decreasing  exponentially  according  to  its  delay. 

We  first  simulate  the  two-user  scenario  with  P™1*  =  P™ax  =  200  and  erf  = 
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Figure  4.3:  User  l’s  power  allocation  using  different  algorithms. 

erf  =  0.01.  The  total  power  of  all  rays  of  Hxx  and  H22  is  normalized  as  one, 
and  that  of  H±2  and  H21  is  normalized  as  0.5.  Fig.  4.3  shows  an  example  of 
user  l’s  power  allocations  when  deploying  different  algorithms  under  the  same 
conditions.  In  IW  algorithm,  user  1  water-fills  the  whole  frequency  band  by 
regarding  its  competitor’s  interference  as  background  noise.  In  contrast,  user 
1  will  not  water- fill  if  choosing  CRM  and  Algorithm  3.1.  It  avoids  the  myopic 
behavior  and  improves  its  performance  by  explicitly  considering  the  stationary 
interference  caused  by  its  opponent. 

To  evaluate  the  performance,  we  tested  105  sets  of  frequency-selective  fading 
channels  that  satisfy  Assumption  4.4.  Denote  user  i’ s  achievable  rate  using  CRM, 
IW  and  Algorithm  3.1  as  Ri,  REE,  and  R[SE  respectively.  Fig.  4.4  shows  the 
simulated  cumulative  probability  of  the  ratio  of  Ri  over  RfE  and  R'iSE.  The 
curve  indicates  that  there  is  a  probability  of  59%  that  CRM  returns  the  same 
power  allocation  strategy  as  IW.  On  the  other  hand,  the  average  improvement  for 
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Figure  4.4:  Cdfs  of  Rl/Rf,E  and  Rl/R'iSE  (i  =  1,2). 


Table  4.5:  Iterations  required  by  different  CRM  algorithms. 


Algorithm 

Probability  of  required  iterations 

t  =  1 

t  =  2 

t  =  3 

t  =  4 

t  >  5 

CRM 

0.59 

0.29 

0.06 

0.04 

0.02 

Modified  CRM 

0.39 

0.19 

0.20 

0.12 

0.10 

user  1  of  CRM  over  IW  is  24%,  which  achieves  almost  the  same  performance  as 
Algorithm  3.1.  As  shown  in  Fig.  4.4,  Ri/R'1se  is  distributed  symmetrically  with 
respect  to  Ri  =  R,EE.  CRM  improves  on  average  user  2’s  data  rate  by  29%  over 
IW,  which  is  smaller  than  Algorithm  3.1.  Similarly  as  in  the  previous  chapter,  in 
very  few  cases,  CRM  results  in  a  rate  R'2  smaller  than  R2E  in  the  IW  algorithm. 

The  iteration  time  required  by  CRM  is  summarized  in  Table  4.5.  As  men¬ 
tioned  above,  CRM  stops  after  just  one  iteration  with  a  probability  of  59%  due 
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to  the  problem  mismatch  shown  in  Fig.  3.  In  most  scenarios,  CRM  terminates 
within  4  iterations  and  the  average  number  of  required  iteration  is  only  1.75.  To 
further  improve  the  performance  of  CRM,  we  can  modify  the  original  CRM  to 
handle  the  problem  mismatch  between  (3.21)  and  (4.4).  Notice  that  problem 
(4.4)  is  only  a  local  approximation  of  problem  (3.21)  at  P ,  t.  Additional  con¬ 
straints  can  be  added  in  Algorithm  4.3,  such  that  the  optimum  of  problem  (4.4) 
is  searched  only  in  a  certain  region  around  Plt  rather  than  the  whole  domain  of 
f^kpk^k.  For  example,  | P[k  —  P^t\  can  be  restricted  within  a  certain  threshold 
when  performing  Algorithm  4.3  for  any  k  G  {1, . . . ,  Ii}  .  We  simulated  the  two- 
user  scenarios  with  additional  restriction  of  |  P[k  —  Pkt  |  <  1.  Fig.  4.5  shows  the 
simulated  cumulative  probability  of  Ri/R^E  for  this  modified  CRM.  As  opposed 
to  CRM,  the  probability  that  the  modified  CRM  returns  the  same  power  alloca¬ 
tion  strategy  as  IW  is  reduced  to  39%  and  the  average  performance  improvement 
is  also  increased  for  both  users.  Specifically,  the  average  performance  improve¬ 
ment  for  user  1  is  29%  and  that  of  user  2  is  31%.  However,  Table  V  shows  that 
the  improvement  is  achieved  at  the  cost  of  more  iterations. 

We  also  tested  performance  of  modified  CRM  in  multi-user  cases  where  TSA 
cannot  be  applied.  We  simulated  the  three-user  scenarios  with  Paiax  =  200  and 
ak  =  0.01.  The  total  power  of  all  rays  of  Hkn  is  normalized  as  one,  and  that 
of  Hk-  ( i  %  j)  is  normalized  as  0.33.  Fig.  4.6  shows  the  simulated  cumulative 
probability  of  R,t/ REE .  The  average  improvement  for  user  1  of  modified  CRM 
over  IW  is  29%,  and  that  of  the  rest  users  is  8%.  We  can  see  that,  it  benefits  on 
average  most  of  the  participants  in  the  power  control  game  if  a  foresighted  user 
forms  accurate  conjectures  and  plays  the  conjecture  equilibrium  strategy. 
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Figure  4.5:  Cdfs  of  Ri/RfE  (i  =  1,2)  for  modified  CRM. 

4.5  Concluding  Remarks 

This  chapter  introduces  the  concept  of  conjectural  equilibrium  in  non-cooperative 
power  control  games  and  discusses  how  a  foresighted  user  can  model  its  experi¬ 
enced  interference  as  a  function  of  its  own  power  allocation  in  order  to  improve  its 
own  data  rate.  The  existence  of  conjectural  equilibrium  is  proven  and  both  game 
theoretic  solutions,  including  Nash  equilibrium  and  Stackelberg  equilibrium,  are 
shown  to  be  special  cases  of  this  conjectural  equilibrium.  Practical  algorithms 
based  on  conjectural  equilibrium  are  developed  to  determine  desirable  power  allo¬ 
cation  strategies.  Numerical  results  verify  that  a  foresighted  user  forming  proper 
conjectures  can  improves  both  its  own  achievable  rate  as  well  as  the  rates  of 
other  participants,  even  if  it  has  no  a  priori  knowledge  of  its  competitors’  private 
information.  How  to  extend  the  framework  to  the  scenarios  in  which  multiple 
foresighted  users  coexist  is  a  topic  for  future  investigation.  While  this  chapter 
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Figure  4.6:  Cdfs  of  Ri/RfE  (i  =  1,2,3)  for  modified  CRM. 

has  focused  on  the  power  control  game,  the  idea  of  forming  conjectures  based 
the  available  local  information  is  also  applicable  to  any  communication  system 
where  making  foresighted  decisions  is  beneficial,  e.g.  distributed  routing  in  wired 
network  [KL095]. 

4.6  Appendix  H:  A  sufficient  condition  for  problem  (4.4) 
to  be  convex 

Define  fai,a2,b  (x)  =  In  ^1  +  ai+*2_bl^J  in  which  the  term  cp  >  0  represents  the 
noise  PSD.  The  second  derivative  of  fai,a2,b  ( x )  is 

,i  .  .  (ai  +  02)  [—  (01  +  a-2 )  (26  —  1)  +  26  (6  —  1)  x] 
ai,a2,b  (ai  +  CZ2  —  bx)2  [ai  +  <32  —  (6  —  1)  x]2 

Clearly,  if  6  7^  0  ,  /"  a2  b  (x)  is  not  always  negative.  We  restrict  the  domain  of 

falta2,b  to  be  dom  fai,a2,b  =  {x  >  0}  fl  {<32  —  6x  >  0},  because  x  is  the  transmitted 
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power  and  02  —  bx  represents  the  stationary  interference,  both  of  which  are  non¬ 
negative.  We  derive  a  sufficient  condition  that  guarantees  fai,a2,b(x)  is  concave 
in  dom  fai,a2,b- 

02  >  0  and  b  <  0.5  ^1 - (4.7) 

This  condition  can  be  simply  verified  by  using  inequality  analysis.  Clearly,  02  >  0 
leads  to  Gq  +  a2  >  0  and  b  <  0.5  ^1  —  <  1.  Therefore,  /"ia2)fe  (x)  <  0  is 

equivalent  to  —  (ai  +  a2)  (2 b  —  1)  +  2b  (6  —  1)  x  >  0.  We  have 


x  G  dom  fai,a2,b  =>■  &2  ~  bx  >  0  =>•  bx  —  a2  • 


CLl  +  <3.2 
®2 


6-0.5 

6-1 


<  0 


—  (ai  +  a2)  (26  —  1)  +  26  (6  —  1)  x  >  0, 


because  ai^a2  ■  >  1  when  6  <  0.5  ^1  —  .  Hence,  the  condition  (4.7)  leads 

t0  faua2,b(X)  <  0- 

Based  on  sufficient  condition  (4.7),  we  can  see,  if  / 3k  >  0  and  <  0.5  ^1  —  ~p\ 
for  any  k  G  {1,  •  •  •  ,  K},  problem  (4.4)  belongs  to  convex  programming.  There¬ 
fore,  if  the  following  sufficient  condition 


Pt 


dl\ 


dP\ 


dlk\ 
'  dPf) 

1 


>  0  and 


p1=p 


SE 


1 

<  2  ~~  2of 


Tk  —  P 

1i  1  \ 


k  dlT] 

dPk ) 


(C9) 


P  — P  ^  1  \  w±  1  /  P  —  P 

rWrS£  rl_r5fi 

holds,  SE  satisfies  the  KKT  optimality  condition  and  solves  the  convex  program¬ 
ming  problem  (4.4),  i.e.  SE  is  also  a  CE. 


4.7  Appendix  I:  Proof  of  Theorem  4.3 

If  the  power  control  game  T  satisfies  condition  C9,  then  problem  (4.4)  is  con¬ 
vex.  We  can  use  the  following  ’’maximum  theorem”  [SLS07,  Ber97]  to  show  that 
Px  (/?,  7)  is  continuous. 
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( Maximum  Theorem )  Let  </>  (x,  y )  be  a  real-valued  continuous  function 
with  domain  X  x  Y,  where  X  C  lZm  and  Y  C  lZn  are  closed  and  bounded 
sets.  Suppose  that  (f)  (x,  y)  is  strictly  concave  in  x  for  each  y.  The  functions 
(y)  =  arg  max  {< f>  ( x ,  y)  :  x  G  A"}  is  well-defined  Vy  e  F,  and  is  continuous. 

We  can  restrict  the  domain  of  parameters  0  and  7  in  closed  and  bounded  set, 
e.g.  |7fc|  <  M+,  in  which  M+  is  a  bound  satisfying  M+  ma xhk  (I  +  Gy1  gk. 

k 

Apply  the  maximum  theorem  with  </>  =  R(P1,0,'y)  =  Yhk=i  The 

optimal  solution  Px  (0,'y)  of  problem  (4.4)  is  the  function  in  the  maximum  the¬ 
orem,  and  hence  is  a  continuous  function  of  (^,7).  As  a  result,  F  (0, 7)  is  also 
continuous  in  (/3, 7) .  Note  that  Pi  (0,j)  and  F  (^,7)  are  not  necessarily  contin¬ 
uously  differentiable. 

By  the  definition  of  F  (/3,7)  and  conjectural  equilibrium,  we  have  that  F  (/?,  7)  = 
0  implies  conjectural  equilibrium.  Note  F  (/?*,7*)  =  0.  If  there  exist  open  neigh¬ 
borhoods  A  C  TZK  and  B  C  1ZK  of  0 *  and  7*,  and  for  V7  G  B  ,  F  (-,7)  :  A  — >  7ZA 
is  locally  one-to-one,  by  the  implicit  function  theorem  [Kum80],  there  exists  open 
neighborhoods  A0  C  7ZA  and  B0  C  7ZA  of  0*  and  7*  such  that  for  each  76^0, 
there  is  a  unique  0(y)  satisfying  F  (0(j),  7)  =  0.  Therefore,  T  admits  an  infinite 
set  of  conjectural  equilibria. 

Alternatively,  we  can  view  F  ( 0 , 7)  =  0  as  equations  with  2K  unknowns, 
hence,  the  equilibrium  is  usually  not  a  single  point  but  a  continuous  surface.  We 
can  explore  the  structure  of  Pi  (/3,7)  to  derive  the  expression  of  this  surface. 
Particularly,  under  condition  (C9),  the  solution  of  convex  problem  (4.4)  satisfies 

7*  (7*  -  1)  (Pi)2  ~  pf  +  /3‘)  (27*  -  1)  i?  +  K  +  /3‘)2  -  =  0,  (4.8) 

hi  —  A! 

where  Af  and  /.ii  are  the  Lagrange  multipliers  as  in  (4.4).  The  optimal  P ,  (0,j)  = 
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{Pi  09,7)}  is  Siven  bY 


i?(0,7) 


+  /3fc)  (27fc  -  1) 

2^k  (7 /k  —  1) 


( a\  +  f3kY  (2yfc  —  l)2  —  4yfc  (7* 


(4.9) 


2^k  (7fc  —  1) 

Note  that  the  other  root  of  equation  (4.8)  is  removed  by  checking  its  feasibility  in 
dom  By  substituting  (4.9)  into  (4.2)  and  (4.5),  we  can  explicitly  express 

F  (/?,7)  in  terms  of  /3  and  7,  resulting  in  a  very  complex  form  of  the  surface.  ■ 
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CHAPTER  5 


Linearly  Coupled  Games 

5.1  Introduction 

In  previous  chapters,  for  ACSCG,  we  analyzed  the  sufficient  condition  under 
which  there  exists  a  unique  NE  and  best  response  dynamics  globally  converges  to 
such  a  NE.  We  also  investigated  how  to  improve  the  inefficiency  of  the  NE.  This 
can  be  accomplished  by  either  enabling  users  exchange  coordination  information 
in  real-time  or  letting  one  foresighted  leader  model  the  reaction  of  the  other  my¬ 
opic  users  and  play  the  SE  or  CE  strategy.  In  this  chapter,  we  present  another 
game  model  for  a  particular  type  of  non-cooperative  multi-user  communication 
scenario  in  which  multiple  users  playing  the  CE  strategy  may  achieve  Pareto  opti¬ 
mality  without  any  real-time  information  exchange.  We  name  it  linearly  coupled 
games,  because  users’  utilities  are  linearly  impacted  by  their  competitors’  actions. 
In  particular,  the  main  contributions  of  this  chapter  are  as  follows.  First,  based 
on  the  assumptions  that  we  make  about  the  properties  of  users’  utility,  we  char¬ 
acterize  the  inherent  structures  of  the  utility  functions  for  the  linearly  coupled 
games.  Furthermore,  based  on  the  derived  utility  forms,  we  explicitly  quantify 
the  NE  and  Pareto  boundary  for  the  linearly  coupled  games.  The  price  of  anarchy 
incurred  by  the  selfish  users  playing  the  Nash  strategy  is  quantified.  In  addition, 
to  improve  the  performance  in  the  non-cooperative  scenarios,  we  investigate  the 
CE-based  solution.  Using  this  approach,  individual  users  are  modeled  as  belief- 
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forming  agents  that  develop  internal  beliefs  about  their  competitors  and  behave 
optimally  with  respect  to  their  individual  beliefs.  Sufficient  (and  necessary)  con¬ 
ditions  that  guarantee  the  convergence  of  different  dynamic  update  mechanisms, 
including  best  response,  gradient  play,  and  Jacobi  update,  are  addressed.  We 
prove  that  these  adjustment  processes  based  on  conjectures  and  non-cooperative 
individual  optimization  can  be  driven  to  Pareto-optimality  in  the  linearly  coupled 
games  without  the  need  of  real-time  coordination  information  exchange  among 
agents.  The  investigated  models  apply  to  a  variety  of  realistic  applications  en¬ 
countered  in  the  multiple  access  design,  including  wireless  random  access  and 
flow  control. 

The  rest  of  this  chapter  is  organized  as  follows.  Section  5.2  defines  the  linearly 
coupled  games.  For  the  investigated  game  models,  Section  5.3  explicitly  analyzes 
the  NE  and  Pareto  boundary  of  the  achievable  utility  region  and  defines  two 
basic  types  of  linearly  coupled  games.  We  will  discuss  Type  1  games  in  details  in 
Chapter  6.  For  Type  If  games,  Section  5.4  quantifies  the  efficiency  loss  between 
NE  and  Pareto  boundary  and  investigates  the  properties  of  CE  under  both  the 
best  response  and  Jacobi  update  dynamics.  Concluding  remarks  are  drawn  in 
Section  5.5. 

5.2  Linearly  Coupled  Games 

Definition  5.1  A  multi-user  interaction  is  considered  a  linearly  coupled  game  if 
the  action  set  An  C  7 Z+  is  convex  and  the  utility  function  un  satisfies: 

«n(a)  =  afn  ■  sn(a),  (5.1) 

in  which  (3n  >  0.  In  particular,  the  basic  assumptions  about  sn( a)  include: 

Assumption  5.1  sn(a)  is  non-negative; 
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Assumption  5.2  Denote  s'  (a)  =  and  s"im{ a)  =  9  f .  sn(a)  is  strictly 

linear  decreasing  in  am,\/m  n,  i.e.  s'nm(&)  <  0  and  s"m( a)  =  0;  sn(a)  is  non¬ 
increasing  and  linear  in  an,  i.e.  s'nn(a)  <  0  and  s"n( a)  =  0,- 


Assumption  5.3  s,n^\  is  an  affine  function,  Vn  €  J\f\  {m}; 

$nm 


nm 


m. 


Assumptions  5.1  and  5.2  indicate  that  increasing  am  for  any  m  f  n  within 
the  domain  of  sn(a)  will  linearly  decrease  user  n’s  utility.  Assumptions  5.3  and 


5.4  imply  that  a  user’s  action  has  proportionally  the  same  impact  over  the  other 


users’  utility.  The  structure  of  the  utility  functions  that  satisfy  assumptions 
5. 1-5.4  will  be  addressed  in  the  next  section. 

5.3  Structure  of  Utility  Functions 

In  this  section,  we  show  that  the  computation  of  the  NE  and  the  Pareto  boundary 
in  linearly  coupled  games  is  equivalent  to  solving  linear  equations.  Moreover,  we 
investigate  the  inherent  structures  of  the  utility  functions  satisfying  assumptions 
5. 1-5.4  and  define  two  basic  types  of  linearly  coupled  games. 

5.3.1  Nash  Equilibrium 

We  are  interested  in  computing  the  NE  in  the  linear  coupled  games.  From  equa¬ 
tion  (5.1),  we  have 


(5.2) 
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On  one  hand,  if  s'nn( a)  =  0,Vn  e  A/",  since  user  n’s  utility  function  strictly 
increases  in  an,  we  have  trivial  NE  at  which  a*  is  the  maximal  element  in  An 
that  lies  in  the  domain  of  s(-),  Vn  G  A f . 

On  the  other  hand,  if  s'nn(a)  7^  0,  Vn  G  Af,  according  to  assumption  5.3,  since 
the  multi-user  interactions  are  linearly  coupled,  we  have 

Sn(a)  =  fri  (a— m)  +  9n(^-m)am,  (5.3) 

where  /™(a_m),  g™(a_m)  are  both  polynomials  and  a_n)  7^  0.  From  this,  it 
follows 

4n(a) 

Sn(a) 

At  NE,  we  have 

51ogMa)]=  ^ 

oan 

Under  assumption  5.3  and  5.4,  ^a~n]  is  a  affine  function,  which  enables  us  to 

9nVa-n) 

explicitly  characterize  the  NE.  Denote  =  hn( a_n).  Equation  (5.5)  can  be 

rewritten  as 

Pn  ■  M a_n)  +  (Pn  +  1)  •  an  =  0,  Vn  G  M.  (5.6) 

Therefore,  the  solutions  of  Equations  (5.6)  are  the  NE  of  the  linearly  coupled 
games  and  computing  the  NE  is  equivalent  to  solving  AAlimension  linear  equa¬ 
tions.  The  following  theorem  indicates  the  inherent  structure  of  the  utility  func¬ 
tions  {un}^=1  when  the  requirements  5. 1-5. 3  are  satisfied. 

Theorem  5.1  Under  assumptions  5. 1-5.3,  the  irreducible  factors  of  sn( a)  over 
the  integers  are  affine  functions  and  have  no  variables  in  common. 

Proof :  Denote  the  factorization  of  sn(a)  as 

Mn 

s«(a)  =  J]X(a),  (5-7) 

i=  1 


/n(a~n) 

.  9n  (a— n) 


+  dr. 


(5.4) 
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in  which  Mn  represents  the  number  of  the  non-constant  irreducible  factors  in 
sn(a).  Define  V(-)  as  the  mapping  from  a  polynomial  to  the  set  of  variables  that 
appear  in  that  polynomial.  Based  on  assumption  5.2,  we  immediately  have 

TO)nVK(a))  =  0,Vi,j(j  ±  i),n. 


Without  loss  of  generality,  we  assume  that  dj  G  V^1  (a))  and  51  ( a)  =  (a_j)  + 

Then  /n(a-j)>^(a-i)  in  (5-3)  are  §iven  by 


Mn 

fn(a-j)  =  ./'i  (a-j)  •  and  ^n(a-j)  = 

i= 2 


Mn 


IB 

i= 2 


,al 


fm(a  \  i  (a— j) 

Therefore,  ^m(a  )  —  By  assumption  5.3,  we  have  that  the  degree  of 

(a _j) 

— — -  is  less  than  or  equal  to  1.  Since  bi( a)  is  irreducible,  we  can  conclude 

STl(a-j)  ^  nV  ' 

°n 

that  gJb i  (a_j)  is  a  constant  and  the  degree  of  (a_,-)  is  less  than  or  equal  to  1. 
Note  that  the  arguments  above  hold,  Vj,  n.  Therefore,  the  degree  of  Bn(a)  is  one, 
Vn  G  A/",  *  =  Mn,  which  concludes  the  proof.  ■ 


5.3.2  Pareto  Boundary 

Since  log(-)  is  concave  and  log[wn(a)]  is  a  composition  of  affine  functions  [BV04], 
un( a)  is  log-concave  in  a  and  the  log-utility  region  logU  is  convex.  Therefore,  we 
can  characterize  the  Pareto  boundary  of  the  utility  region  as  a  set  of  a  optimizing 
the  following  weighted  proportional  fairness  objective1: 

N 

max  V'  un  log[wn(a)] ,  (5.8) 

a  z J 

n=  1 

for  all  possible  sets  of  {u;n}  satisfying  un  >  0  and  ^^=1u;n  =  1-  Denote  the 

optimal  solution  of  problem  (5.8)  as  aPB ,  which  satisfies  the  following  first-order 

^^Note  that  the  utility  region  U  is  not  necessarily  convex.  Therefore,  its  Pareto  boundary 
may  not  be  characterized  by  the  weighted  sum  of  {wn(a)}^=1. 
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condition: 


los[Ma)] 


dar 


=  0,  Vn  G  A/”, 


(5.9) 


Under  assumptions  5. 1-5.3,  the  LHS  of  equation  (5.9)  can  be  rewritten  as 


dcim. 


—  - 1" 


Pm  Smmia) 


By  Theorem  5.1  and  assumption  5.4,  we  have 


+  E 


^ k 


k^m 


Skm\a) 

Sfc(a) 


(5.10) 


Skm\a) 


-,  Vfc  G  Af\  {m}, 


(5.11) 


Sfc(a)  tM») 

in  which  •0m(a)  is  a  affine  function.  Therefore,  equation  (5.10)  is  equivalent  to 


N 


d  g  u jk  logK(a)]  (  pmLUm/ am  +  (1  -  wm)/'0m( a),  if  s'mm( 


a)  =  0; 


dar 


(5.12) 


f3mum/am  +  l/^m(a),  otherwise. 

We  can  compute  the  Pareto  boundary  of  the  linearly  coupled  games  by  solving 
linear  equations: 


N 

log[ufe(a)] 

k= 1 _ 

()(lrn 


=  0 


Pm^m'Pnii. a)  T  (1  ^m)^m  if  Smm(a)  0, 

/3mujmifm(  a)  +  am  =  0,  otherwise. 

(5.13) 


5.3.3  Two  Types  of  Linearly  Coupled  Games 

Theorem  5.1  reveals  the  structural  properties  of  the  utility  functions  {un}^=1 
when  assumption  5. 1-5. 3  are  satisfied.  Based  on  Theorem  5.1,  the  following  the¬ 
orem  further  refines  these  properties  of  {un}!f=1  when  the  additional  assumption 
5.4  is  imposed. 

Theorem  5.2  Under  assumptions  5. 1-5. f,  for  any  polynomial  6^( a)  in  the  fac¬ 
torization  sn(a)  =  T\ftiK(a)>  e  */  |V(^(a))|  >2  or  V(Un( a))  =  {an}, 
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bln( a)  is  an  irreducible  factor  of  sm( a),  Vm  G  A f;  ifV(bln( a))  =  {am},m  ^  n, 
bln{ a)  is  an  irreducible  factor  of  Sj( a),  Vj  G  A/"/ {m}. 


Proof :  By  assumption  5.2,  s'nm( a)  <  0,Vm  7^  n,  we  have  |V(sn(a))|  >  N  — 
1 ,  Vn  G  A/".  By  Theorem  5.1,  the  irreducible  factors  of  sn(a)  have  no  common 
variables  and  they  are  affine  functions.  Suppose  |V(6^(a))|  >  2  and  {am,ai}  G 
V(bln( a).  By  assumption  5.4,  we  know  that  e 

A f\  {m}.  Therefore,  it  follows 


Sfc(a)  = 


5  km 


»^(a) 


(5.14) 


bnm(  a) 

Since  6^m(a)  is  a  constant,  we  can  see  that  6^ (a)  is  an  irreducible  factor  of  s*,(a), 
Vfc  G  Af\  {m}.  By  symmetry,  we  can  conclude  that  &^(a)  must  also  be  an 
irreducible  factor  of  Sfc(a),  Vfc  G  A/"\  {/}.  Therefore,  bln(a)  is  an  irreducible  factor 
of  Sfe(a),  V/c  G  A/".  Similarly,  we  can  prove  the  remaining  parts  of  Theorem  5.2. 


For  the  linearly  coupled  games  satisfying  assumptions  5. 1-5.4,  suppose  we 
factorize  all  users’  state  functions.  Theorem  5.2  indicates  that  any  factor  with  at 
least  two  variables  must  be  a  common  factor  of  all  the  users’  state  functions,  and 
any  factor  with  a  single  variable  a*,  must  be  a  common  factor  of  state  functions 
for  users  excluding  k.  In  reality,  it  corresponds  to  the  communication  scenarios 
in  which  the  state,  i.e.  the  multi-user  coupling,  is  impacted  by  a  set  of  users  that 
result  in  a  similar  signal  to  all  the  users. 

We  define  two  basic  types  of  linearly  coupled  games  satisfying  the  assumptions 
5. 1-5.4.  I11  Type  I  games,  user  fc’s  action  linearly  decreases  all  the  users’  states 
but  itself.  Hence,  the  utility  functions  take  the  form 

^n(a)  ( /-Pn  Ai(Jro).  (5.15) 

rriy^n 


124 


Iii  Type  II  games,  all  the  users  share  the  same  non-factorizable  state  function 
and  their  utility  functions  are  given  by 


(5.16) 


m= 1 


5.3.4  Illustrative  Examples 

There  are  a  number  of  multi-user  communication  scenarios  that  can  be  modeled 
as  linearly  coupled  games.  For  example,  in  the  random  access  scenario,  the  action 
of  a  node  is  to  select  its  transmission  probability  and  a  node  n  will  independently 
attempt  transmission  of  a  packet  with  transmit  probability  pn.  The  action  set 
available  to  node  n  is  An  =  [0, 1]  for  all  n  G  A f.  In  this  case,  the  utility  function 
is  defined  as 


(5.17) 


As  an  additional  example,  in  flow  control  [ZD92],  N  Poisson  streams  of  packets 
are  serviced  by  a  single  exponential  server  with  departure  rate  /i  and  each  class 
can  adjust  its  throughput  rn.  The  utility  function  is  defined  as  the  weighted  ratio 
of  the  throughput  over  the  average  experienced  delay: 

N 

Mr)  =  rft  -(li-Yl  r™)’  (5-18) 

m= 1 

in  which  /3n  >  0  is  interpreted  as  the  weighting  factor.  Specifically,  we  can  see 
that  the  state  determination  functions  are  sn(p)  =  rim&AA{re}(l  —  Pm)  in  (5.17) 
and  sn(r)  =  p  —  irm  in  (5.18).  It  is  straightforward  to  verify  that  these 
functions  satisfy  assumptions  5. 1-5.4  for  both  (5.17)  and  (5.18). 

As  special  examples,  the  random  access  problem  in  (5.17)  belongs  to  Type  I 
games  and  the  rate  control  problem  in  (5.18)  belongs  to  Type  II  games.  In  fact, 
all  the  games  that  have  the  properties  5. 1-5.4  can  be  viewed  as  compositions 


125 


of  these  two  basic  types  of  games.  Therefore,  investigating  the  two  basic  types 
provides  us  the  fundamental  understanding  of  the  linearly  coupled  multi-user  in¬ 
teraction.  We  are  interested  in  comparing  the  achievable  performance  attained 
by  different  game-theoretic  solution  concepts.  On  one  hand,  it  is  well-known 
that  NE  is  generally  inefficient  in  games  [Dub86],  but  it  may  not  require  ex¬ 
plicit  message  exchanges,  while  Pareto-optimality  can  usually  be  achieved  only 
by  exchanging  implicit  or  explicit  coordination  messages  among  the  participating 
users.  On  the  other  hand,  in  previous  chapters,  we  have  applied  the  CE-based 
solution  in  different  communication  scenarios  to  improve  the  system  performance 
in  non-cooperative  settings.  The  remaining  parts  of  the  dissertation  aim  to  com¬ 
pare  the  solutions  of  NE,  Pareto  boundary,  and  CE  in  terms  of  the  payoffs  and 
informational  requirements  in  different  types  of  linearly  coupled  games.  Specif¬ 
ically,  Section  5.4  will  focus  on  Type  If  games  and  Chapter  6  will  use  random 
access  to  illustrate  the  properties  of  various  solutions  in  Type  I  games. 

5.4  Solutions  for  Type  II  Linearly  Coupled  Games 

5.4.1  Nash  Equilibrium  and  Pareto  Boundary 

For  Type  If  games  with  utility  functions  given  in  (5.16),  we  have 


(5.19) 


.N 

m=l  Tm.O'm. 


Therefore,  Equation  (5.6)  can  be  reduced  to 


(5.20) 


The  solution  of  the  linear  equations  gives  the  NE,  and  its  closed  form  has  been 
addressed  in  [DM92]  for  rn  =  l,Vn  e  J\f .  For  the  general  case,  it  is  easy  to  verify 
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that  the  NE  is  given  by 


aNE  = 

an 


Aih 


Tn(  1  +  Em=l  An) 


- ,  Vn  e  A/”. 


(5.21) 


Similarly,  to  compute  the  Pareto  boundary  of  Type  II  games,  Equation  (5.12) 
can  be  reduced  to 


(1  (jJnf3n^Tndn  UnPn  ^  ^  Uraffin  k-TiAi/b  G  A/7  (5.22) 

rriy^n 


The  solution  is  given  by 


aEB  = 


-,Vn  6  AT. 


(5.23) 


TnO-  +  Em=l  UmPm) 

From  Section  5.3.2,  we  know  that  the  region  logZ7  is  convex.  Therefore, 
we  can  compare  the  efficiency  of  &NE  and  aPB  using  the  system-utility  metric 
Yln=i  un  l°g[Mw(a)].  Specifically,  we  have 
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Denote  w0  =  1,  x0  =  t  ,  wn  =  un/3n,  and  xn  =  ,,  ,, v  G  A/. 
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T.  Wn  log  Xn  =  •  log(JJ^71)1/E-=0"'n.  (5.25) 


71=1  V  J  71=0  71=0  71=0 

Using  the  inequalities  among  the  arithmetic,  geometric  and  harmonic  means 
[Spi68],  we  have 
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71=1 
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=  1.  (5.26) 
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=  xn,  i.e. 


Both  inequalities  hold  with  equality  if  and  only  if  xq  —  x\  —  . . . 
uq  =  ...  =  u>n  =  1.  However,  since  we  require  Yln= iun  =  (5.26)  holds  as 

strict  inequalities,  which  leads  to 


•n= 1 


N 


(1  +  ^nfln)  ~  log 


n= 1 


N 


71=1 


71=1 


Based  on  Equation  (5.27),  we  can  make  two  important  observations.  First,  due 
to  the  lack  of  coordination,  the  NE  in  Type  II  games  is  always  strictly  Pareto 
inefficient.  Second,  the  efficiency  loss  in  Type  II  games  are  lower  bounded,  which 
means  that  every  user  receives  positive  payoff  at  NE.  Noticing  that  the  perfor¬ 
mance  gap  between  un( aNE)  and  un{ aPB)  is  non-zero,  we  will  investigate  how 
the  non-cooperative  CE  solution  can  improve  the  system  performance  for  Type 
II  games. 

5.4.2  Linear  Beliefs 

According  to  the  definition  of  CE  in  (4.1),  all  players’  expectations  based  on  their 
beliefs  are  realized  and  each  agent  behaves  optimally  according  to  its  expectation. 
In  other  words,  agents’  beliefs  are  consistent  with  the  outcome  of  the  play  and 
they  use  “conjectured  best  responses”  in  their  individual  optimization  program. 
The  key  challenges  are  how  to  configure  the  belief  functions  such  that  cooperation 
can  be  sustained  in  such  a  non-cooperative  setting  and  how  to  design  the  evolution 
rules  such  that  the  communication  system  can  dynamically  converge  to  a  CE 
having  satisfactory  performance. 

To  define  the  belief  functions,  we  need  to  express  agent  n’s  expected  state 
sn  as  a  function  of  its  own  action  an.  The  simplest  approach  is  to  design  linear 
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belief  models  for  each  user,  i.e.  player  n's  belief  function  takes  the  form 


Sniflni)  ^n)  i 


(5.28) 


for  n  G  AT.  The  values  of  sn  and  dn  are  specihc  states  and  actions,  called  reference 
points  and  An  is  a  positive  scalar.  In  other  words,  user  n  assumes  that  other 
players  will  observe  its  deviation  from  its  reference  point  an  and  the  aggregate 
state  deviates  from  the  reference  point  sn  by  a  quantity  proportional  to  the 
deviation  of  an  —  an.  How  to  configure  sn,  an,  and  An  will  be  addressed  in  the  rest 
of  this  chapter.  We  focus  on  the  linear  belief  represented  in  (5.28),  because  this 
simple  belief  form  is  sufficient  to  drive  the  resulting  non-cooperative  equilibrium 
to  the  Pareto  boundary. 

The  goal  of  user  n  is  to  maximize  its  expected  utility  a^n  ■  sn(an )  taking  into 
account  the  conjectures  that  it  has  made  about  the  other  users.  Therefore,  the 
optimization  a  user  needs  to  solve  becomes: 


max  a': 


A n  (®r; 


(5.29) 


For  Afc  >  0,  user  n  believes  that  increasing  an  will  further  reduce  its  conjectured 
state  sn.  The  optimal  solution  of  (5.29)  is  given  by 

Pn\^n  T  Anan) 


a„  = 


(5.30) 


A„(l  +  Pn) 

In  the  following,  we  first  show  that  forming  simple  linear  beliefs  in  (5.28)  can 
cause  all  the  operating  points  in  the  achievable  utility  region  to  be  CE. 


Theorem  5.3  For  Type  II  games,  all  the  positive  operating  points  in  the  utility 
region  U  are  essentially  CE. 

Proof :  For  each  positive  operating  point  (u{, ,  u*N )  (i.e.  u*n  >  0,  G  A f)  in 

the  utility  region  U ,  there  exists  at  least  one  joint  action  profile  (aj, . . . ,  a*N)  G  A 
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such  that  u*n  =  un( a*),  Vn  G  A/".  We  consider  setting  the  parameters  in  the  belief 
functions  {sn(an)}^=1  to  be: 

* 

A*  =  (3n  ■  ^  ~  2"to=1  TmQm,  Vn  G  .A/'.  (5.31) 

a; 

It  is  easy  to  check  that,  if  the  reference  points  are  sn  =  /i  —  Yhm=i  hn  =  a* , 

we  have  sn(a*)  =  sn(a^, . . . ,  a*N)  and  a*  =  argmax0neyln  un{sn(an),  an).  There¬ 
fore,  this  belief  function  configuration  and  the  joint  action  a*  =  (ai, . .  ■ ,  a*N) 
constitute  the  CE  that  results  in  the  utility  (wj, . . .  ,u*N).  ■ 

Theorem  5.3  establishes  the  existence  of  CE,  i.e.  for  a  particular  a*  G  A,  how 
to  choose  the  parameters  {sn,  an,  An}^=1  such  that  a*  is  a  CE.  However,  it  neither 
tells  us  how  these  CE  can  be  achieved  and  sustained  in  the  dynamic  setting  nor 
clarifies  how  different  belief  configurations  can  lead  to  various  CE. 


We  consider  the  dynamic  scenarios  in  which  users  revise  their  reference  points 
based  on  their  past  local  observations  over  time.  Let  s^,  a^,  sJ ,  s^,  a *n  be  user  n’s 
state,  action,  belief  function,  and  reference  points  at  stage  t,  in  which  = 
H  ~  Em=i  hittL'  We  propose  a  simple  rule  for  individual  users  to  update  their 
reference  points.  At  stage  t,  user  n  sets  its  s and  aAn  to  be  s^-1  and  a^_1.  In 
other  words,  user  n’s  conjectured  utility  function  at  stage  t  is 


“iKWdJ  = 


N 

Tma™1 

m= 1 


A-l 


(5.32) 


Since  we  have  defined  the  users’  utility  function  at  stage  t,  upon  specifying  the  rule 
of  how  user  n  updates  its  action  a ^  based  on  its  utility  function  u^(s^(an),  an), 
the  trajectory  of  the  entire  dynamic  process  is  determined.  The  remainder  of 
this  chapter  will  investigate  the  dynamic  properties  of  the  best  response  and 
Jacobi  update  mechanisms  and  the  performance  trade-off  among  the  competing 
users  at  the  resulting  steady-state  CE.  In  particular,  for  fixed  {An}^=1,  Section 
5.4.3  derives  necessary  and  sufficient  conditions  for  the  convergence  of  the  best 
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response  and  the  Jacobi  update  dynamics.  Section  5.4.4  quantitatively  describes 
the  limiting  CE  for  given  {An}^=1  and  investigates  how  the  parameters  {An}^=1 
should  be  properly  chosen  such  that  Pareto  efficiency  can  be  achieved. 


5.4.3  Dynamic  Algorithms 
5. 4. 3.1  Best  Response 


In  the  best  response  algorithm,  each  user  updates  its  action  using  the  best  re¬ 
sponse  that  maximizes  its  conjectured  utility  function  in  (5.32).  Therefore,  at 
stage  t,  user  n  chooses  its  action  according  to 

fin ( P  rn GA"\ { n }  ^  /Jn(An.  7?i)^L 


On  I3r 


t-  In 


P”1 

ln 


An(l  +  fin 


An(l  +  fin 


(5.33) 


We  are  interested  in  characterizing  the  convergence  of  the  update  mechanism 
defined  by  (5.33)  when  using  various  An  to  initialize  the  belief  function  sn. 


To  analyze  the  convergence  of  the  best  response  dynamics,  we  consider  the 
Jacobian  matrix  of  the  self-mapping  function  in  (5.33).  Let  Jik  denote  the  element 
at  row  i  and  column  k  of  the  Jacobian  matrix  J.  The  elements  of  the  Jacobian 
matrix  JBR  of  (5.33)  are  defined  as: 


tbr  _ 
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For  Type  II  games,  the  following  theorem  gives  a  necessary  and  sufficient  condi¬ 
tion  under  which  the  best  response  dynamics  defined  in  (5.33)  converges. 


Theorem  5.4  For  Type  II  games,  a  necessary  and  sufficient  condition  for  the 
best  response  dynamics  to  converge  is 
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E 


Bn  fin 


^n(l  +  2 fin) 


<  1. 


(5.35) 
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Proof :  The  best  response  dynamics  converges  if  and  only  if  the  eigenvalues 
{fnR}n=i  °f  the  Jacobian  matrix  JBR  in  (5.34)  are  all  inside  the  unit  circle  of  the 
complex  plane  [GD03],  i.e.  \fBJ,'\  <  1  ,Vn  G  A f.  To  determine  the  eigenvalues  of 
JBR,  we  have 
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Therefore,  we  can  see  that,  the  eigenvalues  of  are  the  roots  of 
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(5.36) 


Denote  g(f)  =  Ek  t  tt  ksw  First,  we  assume  that  A  A  With- 

Anli_  /9n 

out  loss  of  generality,  consider  Al  <  A  <  •  •  •  <  Av-  In  this  case,  the  eigen¬ 
values  of  JBR  are  the  roots  of  q(f)  =  1.  Note  that  q(f)  is  a  continuous  func¬ 
tion  and  it  strictly  increases  in  (-00,^),  (4k’  T+k)’  (l+k4  4k)’ 

and  (y!^,+oo).  We  also  have  linp  y  d(£)  =  +00,  lim,  q(£)  = 

—00 ,  n  =  1,2,---  ,  AT,  and  lim^_0Og(^)  =  lin i^+00q(f)  =  0.  Therefore,  the 
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roots  of  g(0  =  1  lie  in  (-00,  I|^),  (yf^,  j&fc),  ■■■,  Since 

g(£)  strictly  increases  in  (— 00,  we  have  |^R|  <  l,Vn  e  A/”  if  and  only 


Pn-1 
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if  g(-l)  =  Ell  An(l+2/3n) 

Second,  we  consider  the  cases  in  which  there  exists  $  =  (3j  for  certain  i,j. 
Suppose  that  {(in}n=\  take  K  discrete  values  aci,  -  -  -  and  the  number  of 
{/3n}n=i  that  equal  to  Kk  is  n*,.  In  this  case,  Equation  (5.36)  is  reduced  to 
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Hence,  equation  q(£)  =  1  has  N  +  K  —  Y^k=in^  ro°ts  hi  total,  and  £  =  ^^7 
is  a  root  of  multiplicity  n*,  —  1  for  Equation  (5.37),  Vfc.  All  these  roots  are  the 
eigenvalues  of  matrix  JBR.  Similarly,  the  roots  of  q(£)  =  1  lie  in  (—00,  ), 
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A  necessary  and  sufficient  condition  under 


which  |£rfR|  <  l,Vn  G  A/"  is  still  g(-l)  <  1,  i.e.  Enl,  a7H&)  <  1 


Remark  5.1  Theorem  5.4  indicates  that,  if  the  condition  in  (5.35)  is  satis¬ 
fied,  the  best  response  dynamics  converges  linearly  to  the  CE.  The  convergence 
rate  is  mainly  determined  by  maxney  \£RR\-  Suppose  (3\  <  (32  <  ■  ■  ■  <  (3n 
and  £,fR  <  fBR  <  •  •  •  <  £BR.  From  the  proof  of  Theorem  5-4,  we  can  see 
that,  under  condition  (5.35),  —1  <  fBR  <  <  ^BR  <  '  <  €nR>  anc d 

1  <  fBR  <  l+NpN .  Therefore,  the  rate  of  convergence  can  be  approximated  by 
max{|£fR|,  |£|P|}.  Note  that  choosing  larger  {An}(Ei  increases  (fR.  Hence,  if 
— 1  <  t(BR  <  —  increasing  {An}A=i?  he.  having  more  self- constraint  users, 

accelerate  the  convergence  rate  of  the  best  response  mechanism.  On  the  other 
hand,  since  ^BR  >  the  convergence  rate  is  lower  bounded  by  1^~1 1 . 

Therefore,  if  more  than  two  users  associate  large  weighting  factors  (3  with  their 
individual  actions  in  the  utility  functions,  we  have  1^~1 1  — >  1  and  the  best  re¬ 
sponse  dynamics  converges  slowly. 
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Remark  5.2  Theorem  5-4  generalizes  the  necessary  and  sufficient  condition  de¬ 
rived  in  [DM92],  where  users  are  assumed  to  be  symmetric,  i.e.  rn  =  1  ,Vn  and 
they  adopt  the  Nash  strategy  by  choosing  Xn  =  rn,Vn.  Due  to  lack  of  symmetry, 
the  derivatioii  in  [DM92]  is  not  readily  applicable  to  analyze  the  convergence  of 
the  best  response  dynamics.  The  proof  of  Theorem  5.4  instead  directly  character¬ 
izes  the  eigenvalues  of  the  Jacobian  matrix,  and  hence,  provides  a  more  general 
convergence  analysis  of  the  dynamic  algorithms  that  allow  users  to  update  their 
actions  based  on  their  independent  linear  conjectures. 

Remark  5.3  In  Type  II  games,  a  locally  stable  CE  is  also  globally  convergent, 
which  is  purely  due  to  the  property  of  its  utility  functions  specified  in  (5.16).  From 
(5.34),  we  can  see  that  all  the  elements  in  JBR  are  independent  of  the  joint  play 
a1  1 .  This  is  in  contrast  with  Type  I  games  that  will  be  considered  in  Chapter  6, 
where  local  stability  of  a  CE  may  not  imply  its  global  convergence  and  the  best 
response  dynamics  may  only  converge  if  the  operating  point  is  close  enough  to  the 
steady-state  equilibrium. 

5. 4. 3. 2  Jacobi  Update 

We  consider  another  alternative  strategy  update  mechanism  called  Jacobi  update 
[LA02],  In  Jacobi  update,  every  user  adjusts  its  action  gradually  towards  the  best 
response  strategy.  At  stage  t,  user  n  chooses  its  action  according  to 

afn  =  Jn( a*”1)  :=  cfc1  +  e[Bn( a*’1)  -  a^1] ,  (5.38) 

in  which  the  stepsize  e  >  0  and  R„(at_1)  is  defined  in  (5.33).  The  following 
theorem  establishes  the  convergence  property  of  the  Jacobi  update  dynamics. 

Theorem  5.5  In  Type  II  games,  for  given  {rn, /3n,  Xn}][=l,  the  Jacobi  update 
dynamics  converges  if  the  stepsize  e  is  sufficiently  small. 
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Proof :  The  Jacobian  matrix  Jju  of  the  self-mapping  function  (5.38)  satisfies 
3JU  =  (1  —  e)I  +  eJBR.  Therefore,  its  eigenvalues  {CnJ}n=i  are  given  by  (Jnu  = 
1  —  e  +  e£,BR-  From  the  proof  of  Theorem  5.4,  we  know  that  fBR  <  1,  Vn  G  A f. 
Therefore,  if  e  <  ,  2  fBR,  we  have  G  (— 1,1), Vn  G  A f  and  the  Jacobi 

i  rn.in.7T,  sn 

update  dynamics  converges.  ■ 

Remark  5.4  Theorem  5.5  indicates  that,  for  any  (rn,  fln,  An}Ei  >  0,  the  Jacobi 
update  mechanism  globally  converges  to  a  CE  as  long  as  the  stepsize  is  set  to 
be  a  small  enough  positive  number.  In  other  words,  the  small  stepsize  in  the 
Jacobi  update  can  compensate  for  the  instability  of  the  best  response  dynamics 
even  though  the  necessary  and  sufficient  condition  in  (5.35)  is  not  satisfied. 


5.4.4  Stability  of  the  Pareto  Boundary 


In  order  to  understand  how  to  properly  choose  the  parameters  {An},^=1  such  that 
it  leads  to  efficient  outcomes,  we  need  to  explicitly  describe  the  steady-state  CE 
in  terms  of  the  parameters  {An}Ei  of  the  belief  functions.  Denote  the  joint 
action  prohle  at  CE  as  (aj, . . . ,  a*N).  From  Equation  (5.33),  we  know  that 


(An  +  fdnrn)a*n  +  ^  / %rma*m  =  f3np,,  Vn  G  A f.  (5.39) 

m£jg\{n} 

The  solutions  of  the  above  linear  equations  are 


a<JE  = 


Mi  +  EL  i  ¥r 


-,Vn  G  A/”. 


(5.40) 


Based  on  the  closed-form  expression  of  the  CE,  the  following  theorem  indicates 
the  stability  of  the  Pareto  boundary  in  Type  II  games. 


Theorem  5.6  For  Type  II  games,  all  the  operating  points  on  the  Pareto  boundary 
are  globally  convergent  CE  under  the  best  response  dynamics. 
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Proof:  Comparing  Equations  (5.23)  and  (5.40),  we  can  see  that,  (aq  , . . . ,  affl) 


n  > ■ 


)  aN 


if  and  only  if  An  =  Tn/ujn.  Substitute  it  into  the  LHS  of  (5.35): 


N 

£ 


1~nfln 


A„(l  +  2 (3n)  “  1  +  2(3. 


N 

£ 


^nfln 


Z^n=l 


< 


1 

2' 


(5.41) 


Condition  (5.35)  is  satisfied  for  all  the  Pareto-optimal  operating  points.  In  fact, 
we  have  minn =  0,  which  is  because  q( 0)  =  Yln=i\L  =  =  1- 

Therefore,  under  the  best  response  dynamics,  the  Pareto  boundary  is  globally 
convergent.  ■ 


In  addition,  we  also  note  that  Theorem  5.5  already  indicates  the  stabil¬ 
ity  of  the  Pareto  boundary  under  Jacobi  update  as  long  as  the  parameters 
{rn,(3n,  K}n=i  are  properly  chosen. 


Remark  5.5  Since  Y^n=iun  =  1,  we  can  see  from  the  previous  proof  that,  the 
belief  configurations  {An}^=1  lead  to  Pareto-optimal  operating  points  if  and  only 

N 

£  Y  =  1-  (5-42) 

i  ^ n 

n= 1 

Therefore,  we  can  see  that,  to  achieve  Pareto- optimality  in  these  non-cooperative 
scenarios,  users  need  to  choose  the  belief  parameters  {An}()r=1  to  be  greater  than 
or  equal  to  the  parameters  {rn}fll=1  in  the  utility  function  {un}fr=1  and  the  sum¬ 
mation  of  y1-  should  be  equal  to  1.  Define  usern’s  conservativeness  as  TS  which 
reflects  the  ratio  between  the  immediate  performance  degradation  —rnAan  in  the 
actual  utility  function  and  the  long-term  effect  — AnA an  in  the  conjectured  utility 
function  if  user  n  increases  its  action  by  A an.  The  condition  in  Equation  (5.f2) 
indicates  that,  to  achieve  efficient  outcomes,  the  non- collaborative  users  need  to 
jointly  maintain  moderate  conservativeness  by  considering  the  multi-user  coupling 
and  appropriately  choosing  {An]A=1.  By  “moderate” ,  we  mean  that  users  are  nei¬ 
ther  too  aggressive,  i.e.  \n  — >  rn  and  ^2^=1  ^ ,  nor  too  conservative,  i.e. 
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0.9 


Figure  5.1:  The  trajectory  of  the  best  response  and  Jacobi  update  dynamics. 


An  —>  Too  and  En=i  X  — ►  0.  If  more  than  one  user  plays  the  Nash  strategy  and 
choose  Xn  =  rn,  Equation  (5. 42)  does  not  hold  and  the  resulting  operating  point 
is  not  Pareto- optimal.  Therefore,  myopic  selfish  behavior  is  detrimental. 


Similarly  as  in  (5.24),  we  have 


N  N 

^  N  Tn(l  +  E  ujPj)  1  +  E 

) 

Using  Jensens  inequality,  we  can  conclude  that  YXi  wn  log  U|1[aPB|  <  0  and 

L “  J-  Un  ) 

. .  ,  CE\ 

X]„=ia;nlog  "n(aCfl)  —  0  */  and  only  if  un  =  Y~,\/n.  Therefore,  if  a  CE  is  Pareto 
efficient,  user  n’s  conservativeness  Tn/\n  corresponds  to  the  weight  assigned  to 
user  n  in  the  weighted  proportional  fairness  defined  in  (5.8). 


tOn(5n  log 


i=i 


N 


+  log 


3= 1 


n=  1 


xnojn(  i + x] 


3= 1 


N 

!+  E  T7 

3  =  1 


(5.43) 


jV 


X] ^  los 


Un.  (  3- 


CE 


n= 1 


Un.  (  3- 


PP 


As  an  illustrative  example,  we  simulate  a  three-user  system  with  parameters 
/ 3  =  [1.5  1  0.5],  r  =  [3  4  5],/r  =  10,u;n  =  |,Vn.  In  this  case,  the  joint  actions 
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Table  5.1:  Actions  and  payoffs  at  NE  and  Pareto  boundary. 


User  1 

User  2 

User  3 

afE 

1.25 

0.625 

0.25 

<E 

3.4939 

1.5625 

1.25 

«: r 

0.833 

0.417 

0.167 

u?B 

3.8036 

2.0833 

2.0412 

and  the  corresponding  utilities  at  NE  and  Pareto  boundary  are  summarized  in 
Table  5.1.  The  price  of  anarchy  quantified  according  to  (5.27)  is  —0.2877  and  the 
lower  bound  in  (5.27)  is  —0.5754.  As  discussed  in  Section  III.C,  both  the  upper 
bound  and  lower  bound  in  (5.27)  are  not  tight.  Fig.  5.1  shows  the  trajectory 
of  the  action  updates  under  both  best  response  and  Jacobi  update  dynamics,  in 
which  a°n  =  0.5,  Xn  =  Vn,  and  e  =  0.5.  The  best  response  update  converges  to 
the  Pareto-optimal  operating  point  in  around  8  iterations  and  the  Jacobi  update 
experiences  a  smoother  trajectory  and  the  same  equilibrium  is  attained  after  more 
iterations. 

5.5  Concluding  Remarks 

We  derive  the  structure  of  the  utility  functions  in  the  multi-user  communication 
scenarios  where  a  user’s  action  has  proportionally  the  same  impact  over  other 
users’  utilities.  We  define  two  basic  types  of  linearly  coupled  games  and  inves¬ 
tigate  the  properties  of  Type  II  game.  The  performance  gap  between  NE  and 
Pareto  boundary  of  the  utility  region  is  explicitly  characterized.  To  improve  the 
performance  in  non-cooperative  cases,  we  investigate  a  CE  approach  which  en¬ 
dows  users  with  simple  linear  beliefs  which  enables  them  to  select  an  equilibrium 
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outcome  that  is  efficient  without  the  need  of  explicit  message  exchanges.  The 
properties  of  the  CE  under  both  the  best  response  and  Jacobi  dynamic  update 
mechanisms  are  characterized.  We  show  that  the  entire  Pareto  boundary  in  lin¬ 
early  coupled  games  is  globally  convergent  CE  which  can  be  achieved  by  both 
studied  dynamic  algorithms  without  the  need  of  real-time  message  passing.  A 
potential  future  direction  is  to  see  how  to  extend  the  CE  approach  to  the  non- 
linearly  coupled  multi-user  communication  scenarios. 
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CHAPTER  6 


Dynamic  Conjectures  in  Random  Access 

Networks 


6.1  Introduction 

As  discussed  in  the  previous  chapter,  the  multi-user  interaction  in  random  ac¬ 
cess  communication  networks  can  be  modeled  as  Type  I  linear  coupled  games. 
It  is  well-known  that  myopic  selfish  behavior  is  detrimental  in  random  access 
communication  networks  [CGA05].  This  chapter  is  concerned  with  developing 
distributed  algorithms  in  random  access  communication  networks  to  improve  the 
throughput  efficiency  from  the  game-theoretic  perspective.  To  avoid  a  network 
collapse  and  encourage  cooperation,  we  adopt  the  conjecture-based  model  and 
enable  the  cognitive  communication  devices  to  build  belief  models  about  how 
their  competitors’  reactions  vary  in  response  to  their  own  action  changes.  The 
belief  functions  of  the  wireless  devices  are  inspired  by  the  concept  of  reciprocity, 
which  refers  to  interaction  mechanisms  in  which  the  emergence  of  cooperative  be¬ 
havior  is  favored  by  the  probability  of  future  mutual  interactions  [Smi82,  Now06]. 
Specifically,  by  deploying  such  a  behavior  model,  devices  will  no  longer  adopt  my¬ 
opic,  selfish,  behaviors,  but  rather  they  will  form  beliefs  about  how  their  actions 
will  influence  the  responses  of  their  competitors  and,  based  on  these  beliefs,  they 
will  try  to  maximize  their  own  welfare.  The  steady  state  of  such  a  play  among 
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belief- forming  devices  can  be  characterized  as  a  conjectural  equilibria  (CE).  At 
the  equilibrium,  devices  compensate  for  their  lack  of  information  by  forming  an 
internal  representation  of  the  opponents’  behavior  and  preferences,  and  using 
these  “conjectured  responses”  in  their  personal  optimization  program  [FJQ04], 
More  importantly,  we  show  that  the  reciprocity  among  these  self-interested  de¬ 
vices  can  be  sustained. 

In  particular,  the  main  contributions  of  this  chapter  are  as  follows.  First,  to 
cultivate  cooperation  in  random  access  networks,  we  enable  self-interested  au¬ 
tonomous  nodes  to  form  independent  linear  beliefs  about  how  their  rival  actions 
vary  as  a  function  of  their  own  actions.  We  design  two  simple  distributed  algo¬ 
rithms  in  which  all  the  nodes’  beliefs  and  actions  will  be  revised  by  observing  the 
outcomes  of  past  mutual  interaction  over  time.  Both  conjecture-based  algorithms 
require  little  information  exchange  among  different  nodes  and  the  internal  com¬ 
putation  for  each  node  is  very  simple.  For  both  algorithms,  we  investigate  the 
stability  of  different  operating  points  and  derive  sufficient  conditions  that  guar¬ 
antee  their  global  convergence,  thereby  establishing  the  connection  between  the 
dynamic  belief  update  procedures  and  the  steady-state  CE.  We  prove  that  all  the 
operating  points  in  the  throughput  region  are  stable  CE  and  reciprocity  can  be 
eventually  sustained  via  the  proposed  evolution.  We  also  provide  an  engineering 
interpretation  of  the  proposed  design  to  clarify  the  similarities  and  differences 
between  the  proposed  algorithms  and  existing  protocols,  e.g.  the  IEEE  802.11 
DCF. 

Second,  we  investigate  the  relationship  between  the  parameter  initialization  of 
beliefs  and  Pareto-efficiency  of  the  achieved  CE.  In  the  economic  market  context, 
it  has  been  shown  that  adjustment  processes  based  on  conjectures  and  individual 
optimization  may  sometimes  be  driven  to  Pareto-optimality  [JT06] .  To  the  best  of 
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our  knowledge,  this  is  the  first  attempt  in  investigating  the  Pareto  efficiency  of  the 
conjecture-based  approach  in  communication  networks.  Importantly,  it  is  shown 
that,  regardless  of  the  number  of  nodes,  there  always  exist  certain  belief  configura¬ 
tions  such  that  the  proposed  distributed  algorithms  can  operate  arbitrarily  close 
to  the  Pareto  boundary  of  the  throughput  region  while  approximately  maintain¬ 
ing  the  weighted  fairness  across  the  entire  network.  Our  investigation  provides 
useful  insights  that  help  to  define  convergent  dynamic  adaptation  schemes  that 
are  apt  to  drive  distributed  random  access  networks  towards  efficient,  stable,  and 
fair  configurations. 

The  rest  of  this  chapter  is  organized  as  follows.  Section  6.2  presents  the  system 
model  of  random  access  networks,  reviews  the  existing  game  theoretic  solutions, 
and  introduces  the  concept  of  CE.  Section  6.3  develops  two  simple  distributed 
algorithms  in  which  nodes  form  dynamic  conjectures  and  optimize  their  actions 
based  on  their  conjectures.  The  stability  of  different  CE  and  the  condition  of 
global  convergence  are  established.  This  section  also  shows  that  nodes’  conjec¬ 
tures  can  be  configured  to  stably  operate  at  any  point  that  is  arbitrarily  close 
to  the  Pareto  frontier  in  throughput  region.  Section  6.4  addresses  the  topics  of 
equilibrium  selection  in  heterogeneous  networks  and  presents  possible  extension 
to  ad-hoc  networks.  Numerical  simulations  are  provided  in  Section  6.5  to  com¬ 
pare  the  proposed  algorithms  with  the  IEEE  802.11  DCF  protocol  and  P-MAC 
protocol.  We  will  compare  the  similarities  and  differences  between  Type  I  and 
II  linear  coupled  games  in  Section  6.6.  Concluding  remarks  are  drawn  in  Section 
6.7. 
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Figure  6.1:  System  model  of  a  single  cell. 

6.2  System  Description 

In  this  section,  we  describe  the  system  model  of  random  access  networks,  define 
the  investigated  random  access  game,  and  discuss  the  existing  game-theoretic 
solutions. 

6.2.1  System  Model  of  Random  Access  Networks 

Following  [LTH07,  CCL08],  we  model  the  interaction  among  multiple  autonomous 
wireless  nodes  in  random  access  networks  as  a  random  access  game. 

As  shown  in  Fig.  6.1,  consider  a  set  /C  =  {1,2,...,  K}  of  wireless  nodes 
and  each  node  represents  a  transmitter-receiver  pair  (link).  We  define  Txk  as  the 
transmitter  node  of  link  k  and  Rx^  as  the  receiver  node  of  link  k.  We  first  assume 
a  single-cell  wireless  network,  where  every  node  can  hear  every  other  node  in  the 
network,  and  we  will  address  the  ad-hoc  network  scenario  in  Section  6.4.2.  The 
system  operates  in  discrete  time  with  evenly  spaced  time  slots  [MM85,  GS06]. 
We  assume  that  all  nodes  always  have  a  data  packet  to  transmit  at  each  time 
slot  (i.e.  we  investigate  the  saturated  traffic  scenario1),  and  the  network  is  noise 

1This  chapter  focuses  on  the  saturated  system  because  we  are  interested  in  throughput 
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free  and  packet  loss  occurs  only  due  to  collision.  The  action  of  a  node  in  this 
game  is  to  select  its  transmission  probability  and  a  node  k  will  independently 
attempt  transmission  of  a  packet  with  transmit  probability  pk-  The  action  set 
available  to  node  k  is  Pk  =  [0, 1]  for  all  k  G  1C* 2.  Once  the  nodes  decide  their 
transmission  probabilities  based  on  which  they  transmit  their  packets,  an  action 
profile  is  determined.  We  denote  the  action  profile  in  the  random  access  game  as 
a  vector  p  =  (pi, . . .  ,Pk)  in  P  =  Pi  x  •  •  •  x  Pk-  Then  the  throughput  of  node  k 
is  given  by3 

uk{p)=Pk  n  (1-Pi)-  (6.1) 

i£lC\{k} 

To  capture  the  performance  tradeoff  in  the  network,  the  throughput  (payoff) 
region  is  defined  as  2F  =  {(wi(p), . . . ,  iix(p))|  3  P  £  P}.  The  random  access 
game  can  be  formally  defined  by  the  tuple  T  =  (1C,  (Pk),  (ufc))  [FT91].  Denote 
the  transmission  probability  for  all  nodes  but  k  by  p_fc  =  (p1; . . .  ,pk-i,Pk+i, 

... ,pk )■  From  (6.1),  we  can  see  that  node  k' s  throughput  depends  not  only 
on  its  own  transmission  probability  pk,  but  also  the  other  nodes’  transmission 
probabilities  p 

6.2.2  Existing  Solutions 

The  throughput  tradeoff  and  stability  of  random  access  networks  have  been  ex¬ 
tensively  studied  from  the  game  theoretic  perspective  [JK02,  LTH07,  CCL08, 
CGA05] 

[LCC07,  MHC09,  MMR09].  This  subsection  briefly  reviews  these  existing  results 

maximization.  The  analysis  can  be  extended  to  investigate  the  non-saturated  networks  where 
the  incoming  packets  of  the  individual  nodes’  queues  arrive  at  finite  rates. 

2The  action  set  can  be  alternatively  defined  to  be  Pk  =  [P™m,P™ax]  and  the  analysis  in 
this  chapter  still  applies. 

3This  throughput  model  assumes  that  time  is  slotted  and  all  packets  are  of  equal  length.  We 
use  this  model  for  theoretic  analysis.  The  throughput  of  the  scenarios  in  which  packet  lengths 
are  not  equal,  e.g.  the  IEEE  802.11  DCF,  will  be  addressed  in  Section  6.5. 
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and  highlights  the  advantage  and  disadvantage  of  different  approaches. 

In  the  random  access  game,  one  of  the  most  investigated  problems  is  whether 
or  not  a  Nash  equilibrium  exists.  The  NE  of  the  investigated  random  access  game 
has  been  addressed  in  the  similar  context  of  CSMA/CA  networks  where  selfish 
nodes  deliberately  control  their  random  deferment  by  altering  their  contention 
windows  [CGA05].  Specifically,  the  transmission  probability  pk  in  our  model  can 
be  related  to  the  contention  window  CWk  in  the  CSMA/CA  protocol,  where 
Pk  =  1+qW  ■  It  has  been  shown  in  [CGA05]  that  at  the  NE,  at  least  one  selfish 
node  will  set  CWk  —  1  (be.  always  transmit).  If  more  than  one  selfish  node 
sets  its  contention  window  to  1,  it  will  cause  zero  throughput  for  all  the  nodes 
in  the  system.  This  kind  of  result  is  known  as  the  tragedy  of  the  commons.  We 
can  see  that,  myopic  selfish  behavior  is  detrimental  in  random  access  scenarios 
and  novel  mechanisms  are  required  to  encourage  cooperative  behavior  among  the 
self-interested  devices.  In  addition,  the  existence  of  and  convergence  to  the  NE 
in  random  access  games  have  been  studied  also  in  other  scenarios,  where  indi¬ 
vidual  nodes  have  utility  functions  that  are  different  from  (6.1)  [JK02,  LTH07]. 
For  example,  the  nodes  in  [JK02]  adjust  their  transmission  probabilities  in  an 
attempt  to  attain  their  desired  throughputs.  A  local  utility  function  is  found  for 
exponential  backoff-based  MAC  protocols,  based  on  which  these  protocols  can 
be  reverse-engineered  in  order  to  stabilize  the  network  [LTH07].  However,  due 
to  the  inadequate  coordination  or  feedback  mechanism  in  these  protocols,  Pareto 
optimality  of  the  throughput  performance  cannot  be  guaranteed. 

Several  recent  works  also  investigate  how  to  design  new  distributed  algorithms 
that  provably  converge  to  the  Pareto  boundary  of  the  network  throughput  re¬ 
gion  [CGA05,  LCC07,  MHC09].  A  distributed  protocol  is  proposed  in  [CGA05] 
to  guide  multiple  selfish  nodes  to  a  Pareto-optimal  NE  by  including  penalties 


145 


into  their  utility  functions.  However,  the  penalties  must  be  carefully  chosen. 
In  [LCC07],  the  utility  maximization  is  solved  using  the  dual  decomposition 
technique  by  enabling  nodes  to  cooperatively  exchange  coordination  information 
among  each  other.  Furthermore,  it  is  shown  in  [MHC09]  that  network  utility  max¬ 
imization  in  random  access  networks  can  be  achieved  without  real-time  message 
passing  among  nodes.  The  key  idea  is  to  estimate  the  other  nodes’  transmis¬ 
sion  probabilities  from  local  observations,  which  in  fact  increases  the  internal 
computational  overhead  of  individual  nodes. 

As  discussed  before,  the  goal  of  this  chapter  is  to  design  a  simple  distributed 
random  access  algorithm  that  requires  limited  information  exchanges  among 
nodes  and  also  stabilizes  the  entire  network.  More  importantly,  this  algorithm 
should  be  capable  of  achieving  high  efficiency  and  of  differentiating  among  het¬ 
erogeneous  nodes  carrying  various  traffic  classes  with  different  quality  of  service 
requirements.  As  we  will  show  later,  the  game-theoretic  concept  of  conjectural 
equilibrium  defined  in  (4.1)  provides  such  an  elegant  solution. 

6.3  Distributed  Algorithms 

By  the  definition  of  CE,  all  nodes’  expectations  based  on  their  beliefs  are  realized 
and  each  node  behaves  optimally  according  to  its  expectation.  In  other  words, 
nodes’  beliefs  are  consistent  with  the  outcome  of  the  play  and  they  behave  op¬ 
timally  with  respect  to  their  beliefs.  The  key  challenges  are  how  to  configure 
the  belief  functions  such  that  reciprocal  behavior  is  encouraged  and  how  to  de¬ 
sign  the  evolution  rules  such  that  the  network  can  dynamically  converge  to  a 
CE  having  satisfactory  performance.  In  this  section,  to  promote  reciprocity,  we 
design  a  prescribed  rule  for  each  node  to  configure  its  belief  about  its  expected 
contention  of  the  wireless  network  as  a  linear  function  of  its  own  transmission 
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probability.  It  is  shown  that  all  the  achievable  operating  points  in  the  throughput 
region  3T  are  CE  by  deploying  these  belief  functions.  Furthermore,  we  propose 
two  distributed  algorithms  for  these  nodes  to  dynamically  achieve  the  CE.  We 
provide  the  sufficient  conditions  that  guarantee  the  stability  and  convergence  of 
the  CE.  We  also  discuss  the  similarities  and  differences  between  the  proposed 
algorithms  and  the  existing  well-known  protocols.  Finally,  it  is  proven  that  any 
Pareto-inefficient  operating  point  is  a  stable  CE,  i.e.  we  can  approach  arbitrarily 
close  to  the  Pareto  frontier  of  the  throughput  region  ST . 

6.3.1  Individual  Behavior 

As  discussed  before,  both  the  state  space  and  belief  functions  need  to  be  defined 
in  order  to  investigate  the  existence  of  CE.  In  the  random  access  game,  we  define 
the  state  Sk  =  rLe*:\{fc}(l  —  Pi)  f°  be  the  contention  measure  signal  representing 
the  probability  that  all  nodes  except  node  k  do  not  transmit.  This  is  because 
besides  its  own  transmission  probability,  its  throughput  only  depends  on  the 
probability  that  the  remaining  nodes  do  not  transmit.  We  can  see  that  state 
Sk  indicates  the  aggregate  effects  of  the  other  nodes’  joint  actions  on  node  fc’s 
payoff.  In  practice,  it  is  hard  for  wireless  nodes  to  compute  the  exact  transmission 
probabilities  of  their  opponents  [MHC09].  Therefore,  we  assume  that  Sk  is  the 
only  information  that  node  k  has  about  the  contention  level  of  the  entire  network, 
because  it  is  a  metric  that  node  k  can  easily  compute  based  on  local  observations. 
Specifically,  from  user  fc’s  viewpoint,  the  probabilities  of  experiencing  an  idle  time 
slot  is  p]fle  =  (1  —  pk)sk •  Let  n]fle  denote  the  number  of  time  slots  between  any 
two  consecutive  idle  time  slots.  n)fle  has  an  independent  identically  distributed 
geometric  distribution  with  probability  p^lle.  Therefore,  we  have  p]fle  =  1/(1  + 
nlkle),  where  is  the  mean  value  of  and  can  be  locally  estimated  by  node  k 
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through  its  observation  of  the  channel  contention  history.  Since  node  k  knows  its 
own  transmission  probability  pk,  it  can  estimate  sk  using  sk  =  l/(l+n]f,e)(l—pk). 
Notice  that  the  action  available  to  node  k  is  to  choose  the  transmission  probability 
pk  G  Pk-  By  the  definition  of  belief  function,  we  need  to  express  the  expected 
contention  measure  sk  as  a  function  of  its  own  transmission  probability  pk-  The 
simplest  approach  is  to  deploy  linear  belief  models,  i.e.  node  fc’s  belief  function 
takes  the  form 

h(Pk )  =  sk-  ak(pk  -  pk),  (6.2) 

for  k  G  1C.  The  values  of  sk  and  pk  are  specific  states  and  actions,  called  refer¬ 
ence  points  [JT06]  and  ak  is  a  positive  scalar.  In  other  words,  node  k  assumes 
that  other  nodes  will  observe  its  deviation  from  its  reference  point  pk  and  the 
aggregate  contention  probability  deviates  from  the  referent  point  sk  by  a  quan¬ 
tity  proportional  to  the  deviation  of  pk  —  pk-  How  to  configure  sk,pk,  and  ak  will 
be  addressed  in  the  rest  of  this  chapter.  The  reasons  why  we  focus  on  the  linear 
beliefs  represented  in  (6.2)  are  two-fold.  First,  the  linear  form  represents  the 
simplest  model  based  on  which  a  user  can  model  the  impact  of  its  environment. 
As  we  will  show  later  in  Section  6.3.5,  building  and  optimizing  over  such  simple 
beliefs  is  sufficient  for  the  network  to  achieve  almost  any  operating  point  in  the 
throughput  region  as  a  stable  CE.  Second,  the  conjecture  functions  deployed  by 
the  wireless  users  are  based  on  the  concept  of  reciprocity  [Smi82,  Now06],  which 
was  developed  in  evolutionary  biology,  and  refers  to  interaction  mechanisms  in 
which  the  evolution  of  cooperative  behavior  is  favored  by  the  probability  of  fu¬ 
ture  mutual  interactions.  Similarly,  in  single-hop  wireless  networks,  the  devices 
repeatedly  interact  when  accessing  the  channel.  If  they  disregard  the  fact  that 
they  have  a  high  probability  to  interact  in  the  future,  they  will  act  myopically, 
which  will  lead  to  a  tragedy  of  commons  (the  zero-payoff  Nash  equilibrium). 
However,  if  they  recognize  that  their  probability  of  interacting  in  the  future  is 
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high,  they  will  consider  their  impact  on  the  network  state,  which  is  captured  in 
the  belief  function  by  the  positive  ak. 

The  goal  of  node  k  is  to  maximize  its  expected  throughput  pk  ■  sk{Pk)  taking 
into  account  the  conjectures  that  it  has  made  about  the  other  nodes.  Therefore, 
the  optimization  a  node  needs  to  solve  becomes: 


max  pk 

Pk&Pk 


&k  PJk  (pk 


(6.3) 


where  the  second  term  is  the  expected  contention  measure  sk(pk )  if  node  k 
transmits  with  probability  p k.  The  product  of  pk  and  sk(pk )  gives  the  expected 
throughput  for  pk  E  Pk-  For  ak  >  0,  node  k  believes  that  increasing  its  transmis¬ 
sion  probability  will  increase  its  experienced  contention  probability.  The  optimal 
solution  of  (6.3)  is  given  by 


Pk 


min 


Sk 

2cifc 


(6.4) 


In  the  following,  we  first  show  that  forming  simple  linear  beliefs  in  (6.2)  can 
cause  all  the  operating  points  in  the  achievable  throughput  region  to  be  CE. 


Theorem  6.1  All  the  operating  points  in  the  throughput  region  A?  are  conjectural 
equilibria. 

Proof :  For  each  operating  point  (ti,  . . . ,  Tk)  in  the  throughput  region  , 
there  exists  at  least  a  joint  action  profile  (p*, . . .  ,p*K)  E  P  such  that  rk  =  uk( p*), 
Vfc  G  /C.  We  consider  setting  the  parameters  in  the  belief  functions  to  be: 

=  HeWi-p:)  (6.5) 

pi 

It  is  easy  to  check  that,  if  the  reference  points  are  Sk  =  n  iex\{fc}  (1  -p*),pk=Pl, 
we  have  sk(p*k)  =  sk(p\ , . . .  ,p*K)  and  p*k  =  argmaxPfcePfc  uk(sk(pk),pk).  Therefore, 
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this  configuration  of  the  belief  functions  and  the  joint  action  p*  =  (pj, . . .  ,p*K) 
constitute  the  CE  that  results  in  the  throughput  (ri, . . . ,  7a-).4 5  ■ 

Theorem  6.1  establishes  the  existence  of  CE,  i.e.  for  a  particular  p*  e  P,  how 
to  choose  the  parameters  {sk,pk,  ak}k= 1  such  that  P*  is  a  CE-  However,  it  neither 
tells  us  how  these  CE  can  be  achieved  and  sustained  in  the  dynamic  setting  nor 
clarifies  how  different  belief  configurations  can  result  in  various  CE. 

In  distributed  scenarios,  nodes  learn  when  they  modify  their  conjectures  based 
on  their  new  observations.  Specifically,  we  first  allow  the  nodes  to  revise  their 
reference  points  based  on  their  past  local  observations.  Let  sk, pk,  sk,  sk, pk  be 
user  k's  state,  transmission  probability,  belief  function,  and  reference  points  at 
stage  t~\  in  which  sk  =  n*eAC\{fc} —  Pi)-  We  propose  a  simple  rule  for  individual 
nodes  to  update  their  reference  points.  At  stage  t,  node  k  set  its  sk  and  pk  to  be 
4  and  p\7 1  •  In  other  words,  node  fc’s  conjectured  utility  function  at  stage  t  is 


«fc(4(Pfc)»Pfc )  =  Pk  [  n  (X  “  Pi  *)  _  ak(Pk  -  pi  X) 

i£K.\{k} 


(6.6) 


The  remainder  of  this  chapter  will  investigate  the  dynamic  properties  of  the  re¬ 
sulting  operating  points  and  the  performance  trade-off  among  multiple  competing 
nodes.  In  particular,  for  fixed  {afc}fc=1,  Sections  6.3.2  and  6.3.3  will  embed  the 
above  individual  optimization  scheme  in  two  different  distributed  learning  pro¬ 
cesses  in  which  all  the  nodes  update  their  transmission  probabilities  over  time. 
Section  6.3.5  further  allows  individual  nodes  adaptively  update  their  parameters 

4By  the  definition  of  CE,  the  configuration  of  the  linear  belief  functions  is  a  key  part  of  CE. 
Since  this  chapter  focuses  on  the  linear  belief  functions  defined  in  (6.2),  we  will  simply  state 
the  joint  action  p*  is  a  CE  hereafter  for  the  ease  of  presentation. 

5This  chapter  assumes  the  persistence  mechanism  for  contention  resolution  except  in  Section 
6.3.4.  In  the  persistence  mechanism,  each  wireless  node  maintains  a  persistence  probability  and 
accesses  the  channel  with  this  probability  [NKGOO].  A  stage  contains  multiple  time  slots.  The 
nodes  estimate  the  contention  level  in  the  network  and  update  their  persistence  probabilities 
in  the  ’’stage-by-stage”  manner.  The  superscript  t  in  this  chapter  represents  the  numbering  of 
the  stages  unless  specified. 
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Table  6.1:  Algorithm  6.1:  A  Distributed  Best  Response  Algorithm  for  Random  Access. 
Initialize:  t  =  0,  the  transmission  probability  pk  G  [0, 1],  and  the  parameter 

afc  >  0  in  node  k' s  belief  function,  V7c  G  /C. 

procedure 

Repeat 

Locally  at  each  node  k,  iterate  through  t: 

Set  t  <—  t  +  1. 

for  all  k  G  /C  do 

At  stage  t,  p\  <-  min (74-72  +  n*eK:\{fc}(1  “  Pi"1) / (2a7 .  !}■ 

end  for 

Node  k  decides  if  it  will  transmit  data  with  a  probability  p\  (or 
equivalently,  maintain  a  window  size  of  CW k  =  2 /pfk  —  1)  for  all  the  time 
slots  during  stage  t. 

end  procedure 


{ak\k=i  such  that  desired  efficiency  can  be  attained.  For  given  {afc}('=1,  Section 
6.4.1  will  derive  a  quantitative  description  of  the  resulting  CE  p*. 


6.3.2  A  Best  Response  Algorithm 


Our  first  algorithm  adopts  the  simplest  update  mechanism  in  which  each  node 
adjusts  its  transmission  probability  using  the  best  response  that  maximizes  its 
conjectured  utility  function  (6.6).  Therefore,  at  stage  t,  node  k  chooses  a  trans¬ 
mission  probability 


p\  =  arg  max  ulk  I 
PfeSPfc 


4  (Pk),Pk)  =  min 


pV1 


+ 


n 


ieK\{k} 


(i-pP) 


(6.7) 


The  detailed  description  of  the  entire  distributed  best  response  procedure  is 
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Learning 


Figure  6.2:  An  illustration  of  the  distributed  algorithms. 

summarized  in  Algorithm  6.1  and  it  is  also  pictorially  illustrated  in  Fig.  6.2.  Next, 
we  are  interested  in  deriving  the  limiting  behavior,  e.g.  stability  and  convergence, 
of  this  algorithm.  For  ease  of  illustration,  the  sufficient  conditions  for  stability 
and  convergence  throughout  this  chapter  are  expressed  in  terms  of  {pk}k=\  and 
{ak}k=v  respectively.  The  mapping  from  {pk)k=i  to  {&k}[ =i  is  given  in  (6.5)  and 
the  mapping  from  {a*.}^  to  {pk)k=\  will  be  addressed  in  Section  6.4.1. 


6. 3. 2.1  Local  Stability 

Although  Theorem  6.1  indicates  that  all  the  points  in  SF  are  CE,  they  may  not 
be  necessarily  stable.  An  unstable  equilibrium  is  not  desirable,  because  any  small 
perturbation  might  cause  the  sequence  of  iterates  to  move  away  from  the  initial 
equilibrium.  The  following  theorem  describes  a  subset  in  P  in  which  all  the  points 
are  stable  CE. 


Theorem  6.2  For  any  p*  =  ( p\ , . . .  ,p*K)  e  P,  if 


k= 1 


<  1, 


or 


V  P*k 

‘  J  1  —  /  )' 


<  1,  VJfe  e  /C, 


(6.8) 
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p*  is  a  stable  CE  for  Algorithm  6.1. 


Proof :  To  analyze  the  stability  of  different  CE,  we  consider  the  Jacobian 
matrix  of  the  self-mapping  function  in  (6.7).  Let  Jo-  denote  the  element  at  row  i 
and  column  k  of  the  Jacobian  matrix  J.  If  pk f1  /2  +  n*e/c\{fc}  (1  ~Pi~1)/ (2ofe)  <  1) 
the  Jacobian  matrix  JBR  of  (6.7)  is  defined  as: 


tbr 

Jik 


dp* 


1 

21 


2<n  IW,fc} 


a  -pri 


if  i  =  k, 
if  i^k. 


(6.9) 


As  proven  in  Theorem  6.1,  for  p*  =  (p\, . . .  ,p*K )  G  P  to  be  a  hxed  point  of  the 
self-mapping  function  in  (6.7),  a*,  must  be  set  to  be  a*k  =  riieA:\{fc}(^  ~P*i)/P*k ■  It 
follows  that 


jBR\ 

Jik  lp=P*,  a=a* 


1 

2i 

Pi 


if  i  =  k, 


Pi  j  -L  r. 

2(1  -pl)i  11  L  T  «'• 


(6.10) 


p*  is  stable  if  and  only  if  the  eigenvalues  {A^.}^  of  matrix  JBR  in  (6.10)  are  all 
inside  the  unit  circle  of  the  complex  plane,  i.e.  |A*,|  <  l,Vfc  G  fC. 


From  Gersgorin  circle  theorem  [HJ81],  all  the  eigenvalues  {A*,}^  of  JBR  are 
located  in  the  region 

U  {|A  -  -O  «  E  I' 4Hl}  (J  {|A  -  J&«\  <  E  1-4“  l}' 

k=  1  ie/c\{fc}  k= 1  ieK.\{k} 


Note  that  JBR  =  1/2,  these  regions  can  be  further  simplified  as 

U{ia-|i<  E  ^)}^U{ia-)k  E 

k=  1  i£K.\{k}  K  ykJ  k= 1  ie/C\{fc}  V  yiJ 

If  either  condition  in  (6.8)  is  satisfied,  all  the  eigenvalues  of  JBR  must  fall  into 
the  region  | A  —  A |  <  1 ,  which  is  located  within  the  unit  circle  |A|  <  1.  Therefore, 
p*  is  a  stable  CE.  ■ 
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Remark  6.1  p*k/(  1  —  p*)  can  be  interpreted  as  the  worst  case  probability  that 
node  k  occupies  the  channel  given  that  node  i  does  not  transmit.  This  metric 
reflects  from  node  k ’s  perspective  the  impact  that  node  i ’s  evacuation  has  on  the 


overall  congestion  of  the  channel.  Therefore,  the  sufficient  conditions  in  (6.8) 


means  that  if  the  system  is  not  overcrowded  from  all  the  nodes’  perspectives, 
the  corresponding  CE  is  stable.  We  can  see  from  Theorem  6.2  that  lowering 
the  transmission  probabilities  helps  to  stabilize  the  random  access  network.  The 
system  can  accommodate  a  certain  degree  of  individual  nodes’  “aggressiveness” 
while  maintaining  the  network  stability.  For  example,  if  a  node  sends  its  packets 
with  a  probability  close  to  1,  as  long  as  the  other  nodes  are  conservative  and 
they  set  their  transmission  probability  small  enough,  the  entire  network  can  still 
be  stabilized.  However,  if  too  many  “aggressive”  nodes  with  large  transmission 
probabilities  coexist,  the  system  stability  may  collapse,  leading  to  a  tragedy  of 
commons. 

6. 3. 2. 2  Global  Convergence 

Note  that  Theorem  6.2  only  investigates  the  stability  for  different  fixed  points, 
i.e.  Algorithm  6.1  converges  to  these  points  when  initial  values  are  close  enough 
to  them.  In  addition  to  local  stability,  we  are  also  interested  in  characterizing 
the  global  convergence  of  Algorithm  6.1  when  using  various  ak  to  initialize  the 
belief  function  sk. 

Theorem  6.3  Regardless  of  any  initial  value  chosen  for  {pQk}k=i,  if  the  param¬ 
eters  {ak)k=1  in  the  belief  functions  {sfc}fc=i  satisfy 


(6.11) 


Algorithm  6.1  converges  to  a  unique  CE. 
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Proof :  For  ak  >  1,  the  self-mapping  function  in  (6.7)  can  be  rewritten  as 

''t_i  ) 


t  pk 
pk  = 


(6.12) 


2  2afc 

We  can  prove  for  Algorithm  6.1  the  uniqueness  of  and  the  convergence  to  CE 
by  showing  that  function  (6.12)  is  a  contraction  map  if  the  condition  in  (6.11)  is 
satisfied. 

Let  d(-)  be  the  induced  distance  function  by  certain  vector  norm  in  the  Eu¬ 
clidean  space.  Consider  two  sequences  of  the  transmission  probability  vectors 
{p°, . . . ,  p*"1,  . . .}  and  {p°, . . . ,  p<-\  pm  . . .}.  We  have 


4p  ,P  )  =  Up  -  P  II  <  llJ 


BR\ |  .  || p*— 1 


P 


1 1 


wBR\ 


d(  J?-1,**-1).  (6.13) 


The  matrix  norm  used  here  is  induced  by  the  same  vector  norm.  Using  ||-||i  for 
the  Jacobian  matrix  of  (6.12)  as  given  in  (6.10),  we  have 

K 

k&K 


|JBi?||i  =  maxVlJ^I  <  l  l  max  V  1 

her  1  9  9  ter 

2=1 


2  2  keK.  ^  a,i 

ieK\{k} 


(6.14) 


Therefore,  if  the  condition  in  (6.11)  is  satisfied,  there  exist  a  constant  q  E  [0, 1) 
and  a  positive  e,  such  that  q  =  ||JBR||i  =  1  —  e  <  1  and  ||pf  —  p* || i  <  gllp4-1  — 


P 


>  t— 1 1 


t-  From  the  contraction  mapping  theorem  [GD03],  the  self-mapping  func¬ 
tion  in  (6.7)  has  a  unique  fixed  point  and  the  sequence  {p*}^  converges  to  the 
unique  fixed  point.  ■ 


Remark  6.2  We  can  also  alternatively  derive  a  sufficient  condition  using 
for  (6.13)  to  be  a  contraction  map.  We  have 

K 


tBR  | 


oo  =  max 
fce/c 


2—1 


kiR\  <  x  +  x  max 


K  -  1 


1  1 

2  2  Teic  ak 


(6.15) 


Therefore,  if  ak  >  K  —  1,  Wk  E  1C,  Algorithm  6.1  also  globally  converges.  How¬ 
ever,  it  is  easy  to  verify  that  it  is  a  special  case  of  the  sufficient  condition  given  by 
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(6.11).  In  addition,  we  can  see  from  (6.11)  that,  if  the  accumulated  “aggressive¬ 
ness”  of  the  nodes  in  the  entire  networks  reaches  a  certain  threshold,  the  global 
convergence  property  may  not  hold.  However,  if  all  the  nodes  back  off  adequately 
by  choosing  their  algorithm  parameters  {afc}f=1  such  that  condition  (6.11)  is  sat¬ 
isfied,  Algorithm  6.1  globally  converges. 


Remark  6.3  Under  the  sufficient  condition  in  (6.11),  by  substituting  (6.5)  into 
(6.11),  the  limiting  points  lie  in  the  set 


P‘  =  (p\,---,Pk) 


E 

i£K\{k} 


Pi 


n 


zeK\W 


(1  -Pi) 


<  1  ,Vfc  e  1C 


(6.16) 


It  is  easy  to  check  that  this  is  a  subset  of  {p*  =  (p{, . . .  ,p*K)\ Ylk=iPk  < 
for  K  >  2,  which  verifies  the  intuition  that  the  set  that  Algorithm  6.1  globally 
converges  to  should  be  a  subset  of  the  set  of  locally  stable  CE. 


6.3.3  A  Gradient  Play  Algorithm 


The  best-response  based  dynamics  may  lead  to  large  fluctuations  in  the  entire 
network,  which  may  not  be  desirable  if  we  want  to  avoid  temporary  system-wide 
instability.  Therefore,  in  this  subsection,  we  propose  an  alternative  gradient  play 
algorithm.  At  each  iteration,  each  node  updates  its  action  gradually  in  the  ascent 
direction  of  its  conjectured  utility  function  in  (6.6).  Specifically,  at  stage  t,  node 
k  chooses  its  transmission  probability  according  to 


pi 


'jt- 1  ,  .  duk(4(Pk):Pk) 

- 

r  dPk 

Pk=ptk~1 

(6.17) 


in  which  [x]ba  means  niax{min{.x',  b},  a).  As  long  as  the  stepsize  7*.  is  small  enough, 
the  entire  network  will  “evolve”  smoothly  and  temporary  system-wide  instability 
will  not  occur.  In  the  following,  we  assume  that  all  nodes  use  the  same  stepsize 
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Table  6.2:  Algorithm  6.2:  A  Distributed  Gradient  Play  Algorithm  for  Random  Access. 
Initialize:  t  =  0,  stepsize  7,  the  transmission  probability  pk  e  [0, 1],  and  the 
parameter  >  0  in  node  P’s  belief  function,  VP  e  /C. 

procedure 

Repeat 

Locally  at  each  node  P,  iterate  through  t: 

Set  t  <—  t  +  1. 

for  all  k  G  /C  do 

1 1 

p^1  +7{niex\{fc}(1  -p*-1) _  afc/4_1}  • 

end  for 

Node  k  decides  if  it  will  transmit  data  with  a  probability  p\  (or 
equivalently,  maintain  a  window  size  of  CWk  =  2/pJ(  —  1)  for  all  the  time 
slots  during  stage  t. 

end  procedure 


At  stage  t,  p\ 


Ik  =  7,Vfc  G  /C  and  0  <  p\  1  <  1.  If  7  is  sufficiently  small,  substituting  the 
utility  function  (6.6)  into  (6.17),  we  have 

ptk=ptk-i+i{  n  (x  -p*_i)  -  ^Pfc-1}-  (6-is) 

ie/c\{fc) 

The  detailed  description  of  the  distributed  gradient  play  learning  mechanism  is 
summarized  in  Algorithm  6.2.  As  for  Algorithm  6.1,  we  investigate  the  stability 
and  convergence  of  this  gradient  play  algorithm. 

6. 3. 3.1  Local  Stability 

First  of  all,  the  following  theorem  describes  a  stable  CE  set  in  P  for  Algorithm 

6.2. 
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(6.19) 


Theorem  6.4  For  any  p*  =  ( p\ , . . .  ,p*K)  G  P,  if 


K 


or  V  — <  1,  VJfc  G  /C, 

1  —  p* 

k= i  ie/c\{fc} 

and  the  stepsize  7  is  sufficiently  small,  p*  is  a  stable  CE  for  Algorithm  6.2. 


j: 


GP I 

ik  lp=P*,  a=a* 


(6.20) 


Proof :  Consider  the  Jacobian  matrix  JGP  of  the  self- mapping  function  in 
(6.18).  We  have  Jffp  =  dp\fdptkx .  As  discussed  above,  for  p*  =  ( p\ , . . .  ,p*K )  G  P 
to  be  a  fixed  point  of  the  self-mapping  function  in  (6.18),  ak  must  be  set  to  be 
al  =  nieyC\{fe}(1  -P*)/P*k-  lt  follows  that 

l-^Uieic\{k}(1-Pl)/Pl  = 

*7lIiex:\{iIfc}(1  “Pi).  if^^- 

p*  is  stable  if  and  only  if  the  eigenvalues  {Afc}^L1  of  matrix  JGP  are  all  inside  the 
unit  circle  of  the  complex  plane,  i.e.  | |  <  l,Vfc  G  /C.  Recall  that  the  spectral 
radius  p(J)  of  a  matrix  J  is  the  maximal  absolute  value  of  the  eigenvalues [HJ81]. 
Therefore,  it  is  equivalent  to  prove  that  p(JGP)  <  1. 

To  a  vector  w  =  (uq,---  ,%)  G  1Z+  with  positive  entries,  we  associate  a 
weighted  Cxj  norm ,  defined  as 


Flloo  =  max 


\xk\ 


(6.21) 


(6.22) 


fce/c  wk 

The  vector  norm  ||  •  ||“  induces  a  matrix  norm,  dehned  by 

1  K 

l|A||“  =  max  —  V  \aki\wi. 

keK  Wb  z — ' 
i= 1 

According  to  Proposition  A. 20  in  [BT97],  p(JGP)  <  ||JGP||“.  Consider  the  vector 
w  =  (uq,  •  •  •  ,  wk)  hi  which  Wk  =  p*k(l  —  p*k).  We  have 

7 P*  EI/gawC1  ~Pi) 

\Jki  m  =  ± - - - -r  2^ 

i=l 


K 


E 

i&K\{k} 


PlO  -  Pi) 


=  1  - 


rLeK\w(l  p*} 

Pi, 


(6.23) 


1-  y 

1  “  « 
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Therefore,  if  YlkeicPk  <  1>  V/c  €  /C,  there  exists  some  j3  >  0  such  that 


IW>(1  Pi)  r„  p*  *n 


UeK.\{k}\ 

~Pl 


1  ~Pk 


>p,  Vfc  G  /C. 


(6.24) 


ie£\W 

If  the  stepsize  7  satisfies  0  <  7  <  1//3,  we  have 

7  n  (i-K) 


|JGP|loo  ~  max<  1 - — -^-7- 

\  pi 


1-  E 


<  I-7/?  <  1.  (6.25) 


;e/c\W 

Since  p(JGP)  <  ||JGP||™  <  1,  all  the  eigenvalues  of  JGP  must  fall  into  the  unit 
circle  |A|  <  1.  Therefore,  p*  is  a  stable  CE.  Similarly,  by  choosing  w  =  [1,  •  •  •  ,1], 
we  can  show  that,  if 

£  rh  K  l' Vfc  e  K'  (6M> 

i£K.\{k}  1 

and  7  is  sufficiently  small,  p*  is  also  stable.  ■ 


6. 3. 3. 2  Global  Convergence 

Similarly  as  in  the  previous  subsection,  we  derive  in  the  following  theorem  a 
sufficient  condition  under  which  Algorithm  6.2  globally  converges. 

Theorem  6.5  Regardless  of  any  initial  value  chosen  for  {p°}(L1;  if  the  param¬ 
eters  {ak\k=\  belief  functions  satisfy 


V  -<l ,Vfce/C, 

'  n 


(6.27) 


ie/c\{fc} 

and  the  stepsize  7  is  sufficiently  small,  Algorithm  6.2  converges  to  a  unique  CE. 


Proof :  For  the  self-mapping  function  in  (6.18),  the  elements  of  its  Jacobian 
matrix  JGP  satisfy 

1  -  7 ak,  if  i  =  k, 


jGp  = 
**  ik 


-7  riiGic\h,fc}(1  ~pi^ 


(6.28) 
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Consider  the  induced  distance  by  weighted  norm  in  the  Euclidean  space. 


We  have 


Ip*  -  p*II"  < 


rGP\\w 
II 00 


(6.29) 


Using  w  =  (1/cq,  •  •  •  ,  1/o.k)  for  (6.29),  we  have 


iGPnw 


»=^ii-7afc+  n 

^  i£/C\{k }  l&K\{i,k} 

1 


7afc(l  —  Pi) 


-?lc  1 1_7afc(1_  £ 


do 


(6.30) 


ieK,\{k} 

Therefore,  if  the  condition  in  (6.27)  is  satisfied,  there  exists  some  j3  >  0  such  that 


afc(l-  ~ )>A  VfcG/C.  (6.31) 

ie/c\{fc}  a* 

If  the  stepsize  7  satisfies  0  <  7  <  1//3,  we  have 

||jOP|l”  <max{1-7ot(1-  Y.  y)}s1-'"3<1'  <6'32> 

k  i&K\{k}  1  > 

Therefore,  there  exist  a  constant  q  G  [0, 1)  and  a  positive  e,  such  that  q  = 
||JGP||“  =  1  —  e  <  1  and  ||p*  —  p*||“  <  g||pt_1  —  pt_1||“  .  From  the  contraction 
mapping  theorem  [GD03],  the  self- mapping  function  in  (6.18)  has  a  unique  fixed 
point  and  the  sequence  {p4}^  converges  to  the  unique  fixed  point.  ■ 


Remark  6.4  Compare  Theorem  6.4  and,  6.5  with  Theorem  6.2  and  6.3.  We  can 
see  that,  given  the  same  target  operating  point  p  or  parameters  {dk\k=i>  Algorithm 
6.2  exhibits  similar  properties  in  terms  of  local  stability  and  global  convergence, 
provided  that  its  stepsize  7  is  sufficiently  small.  In  other  words,  the  limiting  be¬ 
havior  of  these  two  distinct  algorithms  are  similar.  However,  we  need  to  consider 
some  design  trade-off  for  both  algorithms  and  choose  the  desired  algorithm  based 
on  the  specific  system  requirements  about  the  speed  of  convergence  and  the  per¬ 
formance  fluctuation.  Generally  speaking,  the  best  response  algorithm  converges 
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fast,  but  it  may  cause  temporary  large  fluctuations  during  the  convergence  pro¬ 
cess,  which  is  not  desirable  for  transporting  constant-bit-rate  applications.  On  the 


other  hand,  the  gradient  play  algorithm  with  small  stepsize  will  evolve  smoothly 


at  the  cost  of  sacrificing  its  convergence  rate. 

6.3.4  Alternative  Interpretations  of  the  Conjecture-based  Algorithms 

In  this  section,  we  re-interpret  the  proposed  algorithms  using  the  the  backoff 
mechanism  model  in  which  the  transmission  probabilities  change  from  time  slot 
to  time  slot  [NKGOO],  which  helps  us  to  understand  the  key  difference  between 
the  proposed  algorithms  and  802.11  DCF.  The  superscript  t  in  this  subsection 
represents  the  numbering  of  the  time  slots.  We  define  T 'l  and  Tl_k  as  the  events 
that  node  k  transmits  data  at  time  slot  t  and  any  node  in  IC\{k}  transmits  data  at 


fIisK\{fc}(:l-~P*  1) 


pt-i  p 

time  slot  t,  respectively.  If  >  1,  the  RHS  of  (6.7)  equals  to  — 
and  the  best  response  update  function  in  (6.7)  can  be  rewritten  as 


(6.33) 


where  la  is  an  indicator  function  of  event  a  taking  place,  E{a|5}  is  the  ex¬ 
pected  value  of  a  given  b,  E{l{Tt-i=1||p*-1}  =  p'jfl1,  and  E{l|Tt-i=1||pt_1}  = 
1  —  n«eA:\{fc}(l  —  pl1)-  According  to  (6.33),  we  can  provide  an  alternative  inter¬ 
pretation  of  the  best-response  update  algorithm  as  follows.  Consider  the  follow¬ 
ing  update  algorithm.  At  each  time  slot,  if  node  k  observes  that  any  other  node 
attempts  to  transmit,  i.e.  it  senses  a  busy  channel,  it  reduces  its  transmission 
probability  by  a  factor  1/2.  If  no  transmission  attempt  is  made  by  any  node  in  the 
system,  node  k  sets  its  transmission  probability  to  be  1/2 ak-  Otherwise,  if  node 
k  makes  a  successful  transmission,  it  will  transmit  with  probability  0.5(1  +  1  /ak) 
in  the  next  time  slot.  We  can  see  that  equation  (6.33)  characterizes  the  expected 


161 


Did  node  k  transmit? 


Yes 

No 

Did  Yes 

pl=plrl/-2  ( BR ) 

^3 

II 
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to 

Is 

A^pV/2  (DCF) 

A  =  pV  (DCF) 

other 

nodes 

transmit? 

pi  =  0.5(1  +t/ak)  (BR) 

pl  =  l/2ak  (BR) 

No 

p{  =  Pmax  (DCF) 

fe¬ 

ll 

S3 

b 

3 

Figure  6.3:  Comparison  between  the  best  response  algorithm  and  the  IEEE  802.11  DCF 
(P max  jg  Specifieci  jn  f}ie  DCF  protocol). 

trajectory  of  this  alternative  update  mechanism.  Fig.  6.3  compares  this  new 
interpretation  with  the  IEEE  802.11  DCF  [LTH07].  We  can  see  that,  node  k 
behaves  similarly  in  the  best  response  algorithm  and  the  IEEE  802.11  DCF  if 
it  made  a  transmission  attempt  in  the  previous  time  slot,  and  the  fundamental 
difference  between  these  two  protocols  is  how  node  k  updates  its  action  given 
that  it  did  not  transmit  in  the  previous  time  slot.  In  DCF,  p\  is  kept  the  same 
as  j4_1.  However,  as  we  can  see  from  (6.33),  the  best  response  algorithm  either 
performs  back-off  if  the  channel  is  busy  or  sets  p\  to  be  1/2 if  the  channel  is 
free. 

Remark  6.5  Both  Equation  (6.5)  and  (6.33)  intuitively  explain  the  meaning  of 
the  algorithm  parameters  {afcjflp  Note  that  the  numerator  of  (6.5),  — 

p*),  represents  the  probability  that  transmitter  k  experiences  a  contention-free  en¬ 
vironment  at  p* .  The  value  of  l/a*,,  i.e.  the  ratio  between  node  k’s  transmission 
probability  pk  and  its  contention- free  probability,  indicates  the  “aggressiveness”  of 
this  particular  node  at  equilibrium.  In  addition,  according  to  (6.33),  the  trans¬ 
mission  probability  1/2 a*,  also  reflects  node  k’s  “aggressiveness”  in  selecting  its 
transmission  probability  after  it  sensed  a  free  channel.  It  is  straightforward  to 
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see  the  selection  of  {ak\k=i  introduces  some  trade-off  between  the  stability  and 
throughput  of  the  networks.  First  of  all,  large  values  of  {a,k)k=i  refrain  nodes 
from  transmitting  at  a  higher  channel  access  probability,  and  hence,  it  stabilizes 
the  system  at  the  cost  of  reducing  the  throughput.  On  the  other  hand,  lower¬ 
ing  increases  the  nodes’  transmission  probability,  which  may  improve  the 

throughput  performance.  However,  it  can  cause  the  conditions  in  (6.8)  and  (6.19) 
to  fail  and  the  system  becomes  unstable.  Therefore,  the  problems  which  we  will 
investigate  in  the  next  subsection  are  which  part  of  the  throughput  region  can  be 
achieved  with  stable  CE  and  how  the  nodes  can  adaptively  update  their  {ak}^=i 
such  that  the  system  can  attain  efficient  and  stable  operating  points. 

Before  proceeding  to  the  next  subsection,  similarly  as  for  the  best  response 
algorithm,  we  present  an  reinterpretation  of  the  gradient  play.  Equation  (6.18) 
can  be  rewritten  as 

Pk  =  (!  -  7afcK-1E{l{rt-i=i}|pt_1}  +  [pjr1  +  7(1  -  a^“1)]E{l{Tt-i=0}|p^1}. 

(6.34) 

If  afc  >  1,  the  interpretation  of  (6.34)  is  that  at  each  time  slot,  if  node  k  senses  a 
busy  channel,  it  reduces  its  transmission  probability  by  a  factor  1  —  ya^,  otherwise 
it  increases  its  transmission  probability  by  an  amount  y(l  —  akP^T1).  We  can  see 
that,  this  interpretation  of  the  gradient  play  learning  resembles  the  well-known 
AIMD  (Additive  Increase  Multiplicative  Decrease)  control  algorithm,  which  has 
been  widely  applied  in  the  context  of  congestion  avoidance  in  computer  networks 
due  to  its  superior  performance  in  terms  of  convergence  and  efficiency  [CJ89]. 

6.3.5  Stability  of  the  Throughput  Region 

The  results  in  the  previous  subsections  describe  the  values  of  {pk}k=i  and  {ak)k= i 
for  which  local  stability  and  global  convergence  can  be  guaranteed  in  both  Algo- 
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rithm  6.1  and  6.2.  This  subsection  directly  investigates  for  both  algorithms  the 
stability  of  achievable  operating  points  in  the  throughput  region  ST . 

Lemma  6.1  The  Pareto  boundary  of  the  throughput  region  ST  is  the  set  of  all 
points  t  —  (ti,  ,  tk)  such  that  rk  =  pk  rLejcUfc}!1  ~  Pi)  where  p  =  (p1, . . .  ,pK) 
is  a  vector  satisfying  p  >  0  and  J2k&Kpk  =  1/  and  each  such  r  is  determined  by 
a  unique  such  p. 

Proof :  See  Theorem  1  in  [MM85].  ■ 

Theorem  6.6  Regardless  of  the  number  of  nodes  in  the  network,  for  any  Pareto- 
inefficient  operating  point  t*  in  the  throughput  region  ST ,  there  always  exists  a 
belief  configuration  {ak}k=l  stabilizing  Algorithm  6.1  and  6.2,  and  achieve  the 
throughput  t*  .  If  K  >  2,  any  Pareto- optimal  operating  point  {pk}k=\  in  17  that 
satisfies  p*k  >  0,  Vfc  6  K.  is  a  stable  CE  for  Algorithm  6.1  and  6.2. 

Proof:  From  Theorem  6.2,  we  know  that  Y2keicPk  <  1  is  sufficient  to  guaran¬ 
tee  that  the  corresponding  CE  is  stable.  Therefore,  it  is  equivalent  to  check  that 
any  Pareto-inefficient  operating  point  t*  can  be  achieved  with  a  joint  transmission 
probability  p *  E  P  satisfying  YlkeK  Pi  <  1- 
Define  the  throughput  region 

7T{f)  =  {(u1(p),...,u*r(p))|  3  p  G  P,J>  <t},  (6.35) 

k£lC 

in  which  an  additional  constraint  Ylkeic  Pk  —  ^  bnposed.  We  denote  the  Pareto 
boundary  of  17 (t)  as 

dl7(t)  =  {t|  $  t'  G  17 (t)  such  that  r'k  >  rk,  \/k  e  /C  and  r'k  >  rk,  3 k  6  /C}. 

(6.36) 
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Following  the  proof  of  Lemma  6.1,  we  can  draw  a  similar  conclusion:  all  the 
points  on  d£?(t)  satisfy  ^2k&KPk  =  t.  By  Lemma  6.1,  d£?(l)  corresponds  to  the 
Pareto  boundary  of  3F .  Note  that  <9  5^(0)  =  0.  In  other  words,  varying  t  from 
1  to  0  will  cause  dS?  (t)  to  continuously  shrink  from  the  Pareto  boundary  of  the 
throughput  region  3F  to  the  origin  0.  Therefore,  for  any  Pareto  inefficient  point 
T*  £  there  exists  0  <  t'  <  1  such  that  r*  lie  on  i.e.  r*  can  be  achieved 

with  an  action  profile  p*  satisfying  Y^k&cPk  —t<  1. 

To  prove  the  Pareto  boundary  are  stable  CE  when  K  >  2,  we  need  to  show 
that  the  eigenvalues  {£kR}k=i  of  the  Jacobian  matrices  JBR  and  JGP  are  all  inside 
the  unit  circle  of  the  complex  plane  [GD03],  i.e.  \£k\  <  1 ,  Vfc  £  /C.  Take  the  best 
response  dynamics  for  example.  To  determine  the  eigenvalues  of  JBR ,  we  have 


det (£/  -  JSR)  = 
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Therefore,  we  can  see  that,  the  eigenvalues  of  JBR  are  the  roots  of 


K 


Pk 


K 


[i  +  £ 


2(1  ~Pk) 

1  Pk 

2  2(l-pfc) 

Pk 

2(1  -Pfe) 

Pk 


k= 1 


2(1  -pk) 


)=0. 


(6.37) 


Denote  /(£)  =  E&— i  7  f  pfc — •  First,  we  assume  that  p*  ^  p,-,Vi,  j.  With- 

?  2  2(l-pfc) 

out  loss  of  generality,  consider  pi  <  p2  <  ■  ■  ■  <  Pk ■  In  this  case,  the  eigenvalues 
of  JBR  are  the  roots  of  /(£)  =  —  1.  Note  that  /(£)  is  a  continuous  function 
and  it  strictly  decreases  in  (-00,  §  +  (5  +  afnTW  l  +  2Mi))>  ' '  ’  >  + 

W-PK-lV  \  +  2(&))’  and  +°°)-  We  also  have  lin\-a  +  ,^)-  /(0  = 

-00,  linn  ,(1  ,  ))+  /(£)  =  +00,  n  =  1,  2,  •  •  •  ,  K ,  and  lim^-oo  /(£)  =  lim^+oc 

/(£)  =  0-  Therefore,  the  roots  of  /(£)  =  —1  lie  in  (—00, 1  +  9(1Plpi)),  (|  + 

2^V  I  +  •  •  •  >  (I  +  2(i-KPk1-i)’  I  +  2d5T))  respectively. 

For  the  operating  points  on  the  Pareto  boundary,  we  have  J2k=iPk  —  1-  If  is 

easy  to  verify  that  /( 0)  =  —1,  i.e.  £1  =  0.  Therefore, 


p(JBR)  =  max  If*  |  =  £k  e  Q  + 


PK-1 


1 

■)  o  + 


Pk 


2(1  —  Pk-i)  ’  2  2(1 -p^)'' 


(6.38) 


To  see  £K  <  1  for  0  <  p1  <  p2  <  ■  ■  ■  <  Pk  and  EfcLi  Pk 


1.  We  differentiate 


two  cases: 


1)  If  Pk  <  0.5,  we  have  <  \  +  2(i-PK)  -  ^ 

2)  If  pK  >  0.5,  we  have  |  +  <  1  and  I  +  2(i-Pk)  >  L  Since  /(0 

strictly  decreases  in  (1  +  2(i-p^  yy>  \  +  2(1^^-))’  we  Iiave  p(JBi?)  <  1  if  and  only 
if  /( 1)  <  —1.  In  fact, 


/( l)-(-l) 


£ 


Pk 

1  -  2pfc 


+  1 


f>h-2p(! 


1-2ESP 


-)  <  0.  (6.39) 


The  inequality  holds  because  \  <  - l_, —  for  k  =  1,  2, . . . ,  K  —  1  when 

1  1-2Em=lPm 

>  0.5,  0  <  pi  <  p2  <  ■  ■  ■  <  pK,  and  E?=i  Pk  =  1- 
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Figure  6.4:  Comparison  among  different  solution  concepts. 


Second,  we  consider  the  cases  in  which  there  exists  pi  =  pj  for  certain  i,j.  Sup¬ 
pose  that  {pk}{ ~i  take  M  discrete  values  «i,  •  •  •  and  the  number  of  {pk}k=i 
that  equal  to  Km  is  nm.  In  this  case,  Equation  (6.37)  is  reduced  to 

K  Pk  M  1 

)""  0. 


k=  1  2 


tlr 


2(1  -  Km)' 


(6.40) 


2(1— Pk)  m=l 

Hence,  equation  /(£)  =  —1  has  K  +  M  —  Y^k=inm  roots  in  total,  and  £  =  |  + 
,,nKm  .  is  a  root  of  multiplicity  nm  —  1,  Vm.  All  these  roots  are  the  eigenvalues  of 

matrix  J 11 11 .  Similarly,  the  remaining  roots  of  /(£)  =  —1  lie  in  (— oo,  |  +  ^pz^y), 

(l  j _ m _  1  j _ is _ )  ...  ('1-1 _ km- i _  I  j _ a _ \  jf  x  >  o  m  =  1 

\2  ^  2(1— «i)  ’  2  ^  2(1— re2)  h  ’  '  2  ^  2(1-km_1)>  2  ^  2(1-km)L  11  ^  l^k= 1«  1 

is  still  sufficient  to  guarantee  that  /(l)  <  —1.  Therefore,  |£A/;|  <  1,  V/c  £  /C.  ■ 

Fig.  6.4  compares  the  throughput  performance  among  various  game-theoretic 
solution  concepts,  including  Nash  equilibria,  Pareto  frontier,  locally  stable  conjec¬ 
tural  equilibria,  and  globally  convergent  conjectural  equilibria,  in  random  access 
games.  As  proven  in  Theorem  6.6,  Fig.  6.4  shows  that,  the  entire  space  spanning 
between  the  Nash  equilibria  and  Pareto  frontier  essentially  consists  of  stable  con- 
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jectural  equilibria.  In  addition,  as  discussed  in  Remark  6.3,  the  set  of  globally 
convergent  CE  is  a  subset  of  the  stable  CE  set. 

In  practice,  it  is  more  important  to  construct  algorithmic  mechanisms  to  at¬ 
tain  the  desirable  CE  that  operate  stably  and  closely  to  the  Pareto  boundary. 
To  this  end,  we  develop  an  iterative  algorithm  and  summarize  it  as  Algorithm 
6.3.  Specifically,  this  algorithm  has  an  inner  loop  and  an  outer  loop.  The  inner 
loop  adopts  either  Algorithm  6.1  or  6.2  to  achieve  convergence  for  fixed  {a*,})? L1. 
This  algorithm  initializes  a*,  >  |/C|  such  that  it  initially  globally  converges.  After 
converging  to  a  stable  CE,  the  outer  loop  adaptively  adjusts  {afc}^=1  until  desired 
efficiency  is  attained.  The  outer  loop  updates  in  the  multiplicative  man¬ 

ner  due  to  two  reasons.  First,  reducing  {ak}^=1  individually  increases  {p\.}£=1 
and  Y^k=i  Vk  and  hence,  moves  the  operating  point  towards  the  Pareto  boundary. 
Second,  multiplying  {afc}^=1  by  the  same  discount  factor  can  maintain  weighted 
fairness  among  different  nodes.  Both  reasons  will  be  analytically  explained  in 
the  Section  6.4.  It  is  also  worth  mentioning  that  individual  nodes  can  measure 
the  Pareto  efficiency  in  a  fully  distributed  manner  during  the  outer  loop  itera¬ 
tion.  For  example,  individual  nodes  can  estimate  the  other  nodes’  transmission 
probabilities  {p\}^=1  based  on  its  local  observation  and  figure  out  whether  the 
current  operating  point  is  close  to  the  Pareto  boundary  by  calculating  J2k&tcPk 
[MHC09].  When  the  network  size  grows  bigger,  individually  estimating  different 
nodes’  transmission  probabilities  becomes  challenging.  An  alternative  solution 
is  that  individual  nodes  can  instead  monitor  their  common  observation  of  the 
aggregate  throughput  Y^keKuk  and  terminate  the  update  of  once  the  ag¬ 

gregate  throughput  starts  to  decrease.  Next,  we  discuss  several  implementation 
issues  regarding  Algorithm  3.  First,  it  is  not  necessary  that  all  the  nodes  update 
their  parameters  synchronously.  However,  these  nodes  need  to  maintain 

the  same  update  frequency,  e.g.  each  node  will  update  its  parameter  after  a  cer- 
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Table  6.3:  Algorithm  6.3:  Adaptive  Distributed  Algorithm  for  Random  Access. 


Initialize:  stepsize  7  and  S ,  the  transmission  probability  pk  G  [0, 1],  and  the 
parameter  a*,  >  |/C|  in  node  fc’s  belief  function,  Vfc  G  /C. 

procedure 

outerloop:  For  each  node  k,  a*,  <—  0^(1  —  5). 

innerloop:  Locally  at  each  node  k,  use  Algorithm  6.1  or  6.2  to  update  p\. 
until  it  converges. 

until  the  aggregate  throughput  is  maximized  or  YlkeicPk  ~  1- 

end  for 

end  procedure 


tain  number  of  timeslots  or  seconds.  As  long  as  5  is  small,  the  performance  gap 
between  the  actual  CE  and  the  intended  one  will  not  be  large.  Moreover,  in  order 
to  guarantee  fairness,  the  new  incoming  nodes  need  to  know  the  real-time  pa¬ 
rameters  of  the  old  nodes  in  the  same  traffic  class.  This  initialization  only  needs 
to  be  done  once,  when  the  new  nodes  enter  the  cell  by  tracking  the  evolution  of 
the  transmission  probabilities  of  the  nodes  in  the  same  traffic  class. 

6.4  Extensions  to  Heterogeneous  Networks  and  Ad-hoc 
Networks 

In  this  section,  we  first  investigate  how  users  with  different  qualify-of-service 
requirements  should  initialize  their  belief  functions  and  interact  in  the  heteroge¬ 
neous  network  setting  and  show  that  the  conjecture-based  approaches  approxi¬ 
mately  achieve  the  weighted  fairness.  Furthermore,  we  discuss  how  the  single-cell 
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solution  can  be  extended  to  the  general  ad-hoc  network  scenario,  where  only  the 
devices  within  a  certain  neighborhood  range  will  impact  each  other’s  throughput. 


6.4.1  Equilibrium  Selection  for  Heterogeneous  Networks 


Consider  a  network  with  N  >  1  different  classes  of  nodes.  Let  </>n  denote  the 
parameter  that  class-n  nodes  choose  for  their  conjectured  utility  functions  (i.e. 
the  parameter  a,k  if  node  k  belongs  to  class-n)  and  Tn  denote  the  set  of  nodes 
that  set  their  algorithm  parameters  to  be  (f>n,  1  <  n  <  N .  At  equilibrium,  the 
transmission  probabilities  of  the  same  class  of  nodes  are  equal,  denoted  as  pn. 
Before  we  proceed,  we  first  define  the  weighted  fairness  for  the  random  access 
game  [QS02],  For  each  traffic  class  n,  we  associate  with  a  positive  weight  \n- 
Then  the  weighted  fairness  intended  for  the  random  access  game  satisfy 


Vi,  j  £  {1,  2,  •  •  •  ,  N},  s  E  Ti,  s'  G  T j, 


E{1{t_s=o}1{ts=i}>  _  E(1{t_3/=o}1{ts/=i}} 


Xi 


Xj 


(6.41) 


which  means  that  the  probability  of  an  successful  transmission  attempt  for  traffic 
class  n  is  proportional  to  its  weight  Xn-  By  simple  manipulation,  we  have  the 
equivalent  form  for  equation  (6.41)  [QS02]: 


G  {1,2,  ■■■  ,N}, 


Pi 


WF 


P 


WF 


(i  -  pYf)xi  (i  -  pfF)xj 


(6.42) 


Recall  that  Theorem  6.1  showed  how  to  choose  {ak}[ Lx  given  a  desired  op¬ 
erating  point  {p*k}k=i  such  that  it  is  a  CE.  The  following  theorem  indicates  the 
quantitative  relationship  between  the  chosen  algorithm  parameters  {4>n}n= i,  the 
sizes  of  different  classes  {Jrn}n= i>  and  the  resulting  steady-state  transmission 
probabilities  {pn}n= i-  More  importantly,  it  also  shows  that  if  the  network  size  is 
large,  the  conjecture-based  algorithms  approximately  achieve  weighted  fairness. 
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Theorem  6.7  Suppose  that  <f>n  >  2,  VI  <  n  <  N .  The  achieved  steady-state 
transmission  probabilities  {pn}n=\  are  given  by 

1 

Pn=2 

where  g  satisfies 

1  N 

n=  1 

Proof :  As  shown  in  Theorem  6.1,  a*kp*k  =  riieJC\{fc}(^  —  Pi)-  Denote  g  = 
—  pi).  Therefore,  we  obtain 

<f>nPn(  1  -  Pn)  =  P,  V  1  <  H  <  N.  (6.45) 

Since  f>n  >  2,  we  have  pn  <  0.5.  Such  a  root  of  the  quadratic  equation  in  (6.45) 
is  given  in  (6.43).  Note  that  g  =  riieK;(l  —  Pi)-  Substituting  (6.43)  into  this 
equality,  we  get  (6.44). 

We  can  verify  that  a  unique  g  satisfying  the  equality  in  (6.44)  exists  if  f>n  > 
2,  VI  <  n  <  N.  This  is  because  the  RHS  of  (6.44)  is  feasible  for  g  <  min  {<pn}/A 

1  <n<N 

and  it  is  a  strictly  decreasing  function  in  g.  Meanwhile,  the  LffS  of  (6.44)  is 
strictly  increasing  on  g  G  [0,  min  {4>n}/ 4].  Note  that  when  g  =  min  {</>n}/4, 

1  <n<N  l<n<W 

LHS  of  (6.44)  =  i  miri^  ^  ^  >  RHS  of  (6.44).  (6.46) 

if  0n  >  2.  Therefore,  a  unique  g  G  [0,  ^  mhi^{(5n}/4]  satisfies  (6.44)  exists.  ■ 


Remark  6.6  There  are  several  intuitions  and  observations  that  we  can  obtain 
from  Theorem  6.7.  First,  the  multiplicative  decreasing  update  in  Algorithm  6.3 
aims  to  move  the  operating  points  towards  Pareto  boundary.  A  quantitative  ap¬ 
proximation  between  the  steady-state  transmission  probability  pn  and  the  algo¬ 
rithm  parameter  f>n  of  each  traffic  class  can  be  derived  if  a  large  number  of  nodes 
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coexist.  Since  g  — >  0  when  Tn  \  is  large,  using  the  Taylor  expansion,  pn  can  be 
approximated  as  g/<fin,  *.e.  the  steady-state  transmission  probability  pn  decays 
as  the  inverse  first  power  of  parameter  <fn  that  indicates  the  “aggressiveness”  of 
traffic  class  n.  Finally,  we  also  observe  from  (6.f5)  that,  if  \Tn\  is  large,  pn  —>  0 
and  1  —  pn  &  1.  Therefore, 

Vi,j  e  {1,2,  •••  ,  N},fiiPi{l  ~pi)  =  fijpfil  -  pj)  =►  .  (6.47) 

L  Pi  I  Pj 

Equation  (6.f7)  indicates  that  Algorithm  6.1  and  6.2  approximately  achieve  weighted 
fairness  given  in  (6.f2)  with  weight  \n  =  1  / (j)n.  Moreover,  it  is  worth  mentioning 
that  the  weighted  fairness  is  purely  an  implicit  by-product  of  the  conjecture-based 
approach  and  it  can  be  sustained  with  stability.  Therefore,  Algorithm  6.3  chooses 
to  multiply  {afc}*=1  by  the  same  discount  factor  1  —  5  such  that  the  weighted 
fairness  can  be  maintained. 

6.4.2  Extension  to  Ad-hoc  Networks 

Consider  a  wireless  ad-hoc  network  with  a  set  /C  =  {1,  2, ... ,  A'}  of  distinct  node 
pairs  in  Fig.  6.5.  Each  link  (node  pair)  consists  of  one  dedicated  transmitter  and 
one  dedicated  receiver.  We  assume  that  the  transmission  of  a  link  is  interfered 
from  the  transmission  of  another  link,  if  the  distance  between  the  receiver  node 
of  the  former  and  the  transmitter  node  of  the  latter  is  less  than  some  threshold 
Dth  [LZL07,  LCC07].  For  any  node  i,  we  define  fi  C  K,  as  the  set  of  nodes  whose 
transmitters  cause  interference  to  the  receiver  of  node  i  and  Oj  C  /C  as  the  set  of 
nodes  whose  receivers  get  interfered  from  the  transmitter  of  node  i.  For  example, 
in  Fig.  6.5,  I\  =  {K}  and  0\  =  {2,  A"}.  Then,  the  throughput  of  node  i  is 

«fc(p)  IJ(1  ~Pi)-  (6-48) 

ie/fc 
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Figure  6.5:  System  model  of  ad  hoc  networks. 


In  this  scenario,  the  state,  namely,  the  contention  measure  signal,  can  be  redefined 
according  to  Sk  =  riie4(l—  Pi)-  Applying  the  conjecture-based  approach,  we  have 
the  following  conjectured  utility  function  for  node  k: 


u i 


:M,Pk) =Pk  n^1 


p 


t~i\ 


®k(Pk 


pV) 


(6.49) 


i£.Ik 


Parallel  to  the  theorems  proven  in  Section  6.3.2  and  6.3.3,  we  have  the  follow¬ 
ing  theorems  on  the  stability  and  convergence  of  conjecture-based  algorithms  in 
ad-hoc  networks.  These  theorems  can  be  shown  similarly  as  in  Section  6.3,  and 
hence,  the  proofs  are  omitted. 
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6.4. 2.1  Stability  and  Convergence 


Theorem  6.8  For  any  p*  =  ( p\ , . . .  ,p*K)  G  P,  if 


ie(  .. 


p*  is  a  stable  CE  for  Algorithm  6.1  and  Algorithm  6.2  with  sufficiently  small  7. 

Theorem  6.9  Regardless  of  any  initial  value  chosen  for  L1;  if  the  param¬ 
eters  {afc}f in  the  belief  functions  {sk}k=i  satisfy 


(6.51) 


Algorithm  6.1  and  Algorithm  6.2  with  sufficiently  small  7  converge  to  a  unique 


CE. 


Remark  6.7  We  observe  that  the  sufficient  conditions  in  Theorem  6.8  and  6.9 
are  more  relaxed  compared  with  the  theorems  in  Section  6.3.  .4s  opposed  to  the 
single-cell  case,  the  mutual  interference  is  reduced  in  ad-hoc  networks  due  to  the 
large  scale  geographical  distance,  therefore,  these  nodes  can  potentially  improve 
their  throughput  by  increasing  their  transmission  probabilities  while  still  main¬ 
taining  the  local  stability  as  well  as  global  convergence. 

Remark  6.8  In  ad-hoc  networks,  the  parameters  {ak}^=i  can  b e  determined  in 
a  distributed  fashion  such  that  the  sufficient  conditions  in  Theorem  6. 9  are  sat¬ 
isfied.  For  example,  consider  the  symmetric  case  where  transmitter  i  interferes 
with  receiver  j  if  and  only  if  transmitter  i  can  receive  signals  from  receiver  j. 
Each  transmitter  can  listen  to  the  channel  and  estimate  \Of\  by  intercepting  the 
ACK  packets  sent  by  the  receivers  of  the  nodes  in  set  Ok-  An  alternative  dis¬ 
tributed  solution  is  that  each  transmitter  broadcasts  its  parameter  ak,  and  receiver 
k  calculates  7  and  notifies  the  nodes  in  set  Ik  to  adjust  their  parameters 

accordingly. 
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6. 4. 2. 2  Stability  of  the  Throughput  Region 


We  also  extend  the  stability  analysis  of  the  throughput  region  from  the  single-cell 
scenario  to  the  ad-hoc  networks.  The  following  lemma  explicitly  describes  the 
Pareto  frontier  of  the  throughput  region. 


Lemma  6.2  The  Pareto  boundary  of  the  throughput  region  PT  can  be  charac¬ 
terized  as  the  set  of  points  r  =  (ti,  . . . ,  tk)  optimizing  the  weighted  proportional 
fairness  objective  [GS06]: 


max  V'  a )klogrk, 
peP 

k&K. 


(6.52) 


in  which  Tk  =  pk  1X^4 (-*-  ~  Pi)  for  a M  possible  sets  of  positive  link  “weights" 
{jJk)k=i-  Specifically,  for  a  particular  weight  combination  {o^} j[L1;  the  optimal 


p'  is  given  by 


Pk  = 


OJk 


Uk  + 


ieOk 


(jJi 


(6.53) 


Proof-.  See  [GS06]  for  details. 


Based  on  Lemma  6.2,  we  derive  in  the  following  theorem  the  necessary  and 
sufficient  condition  under  which  a  particular  Pareto-efficient  operating  point  is 
a  stable  CE  for  Algorithm  6.1.  Similar  results  can  be  derived  for  Algorithm  6.2 
with  sufficiently  small  7. 


Theorem  6.10  Suppose  p*  =  (p{,...,p*K)  G  P  satisfies  (6.53)  and  maximizes 
the  problem  in  (6.52).  The  elements  of  the  Jacobi  matrix  J  at  p*  satisfy 

if  i  —  k, 

if  he  h,  (6-54) 

otherwise. 

Ifp(J)  <  1 ,  p  is  a  stable  CE  for  Algorithm  1. 


Jik 


1 

2’ 


2(1  ~PlV 

0, 
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Remark  6.9  Theorem  6.10  generalizes  the  result  in  Theorem  6.6  from  the  single¬ 
cell  scenario  to  the  ad-hoc  networks.  Consider  the  l\  norm  for  J  at  p.  We  have 


(6.55) 


In  the  single-cell  case,  Ok  =  IC\{k},  Wk  G  1C,  and  ||  J||i  equals  to  1  for  any  Pareto- 
optimal  operating  point.  Therefore,  any  Pareto  inefficient  operating  point  can  be 
achieved  with  stability  due  to  p(J)  <  || J||i  <  1.  However,  in  ad-hoc  networks, 


the  form  of  the  Jacobi  matrix  J  depends  on  the  actual  network  topology  and  it 


is  difficult  to  bound  the  spectral  radius  for  a  generic  setting  using  certain  matrix 
forms,  such  as  l\  norm  or  l^  norm.  Alternatively,  according  to  Theorem  6.10, 
we  will  numerically  test  the  stability  of  the  Pareto-optimal  operating  points  in  the 
simulation  section. 

6.5  Numerical  Simulations 

In  this  section,  we  numerically  compare  the  performance  of  the  existing  802.11 
DCF  protocol,  the  P-MAC  protocol  [QS02]  and  the  proposed  algorithms  in  this 
chapter. 

We  first  illustrate  the  evolution  of  transmission  probabilities  of  Algorithm  6.1 
and  6.2.  We  simulate  a  single-cell  network  of  5  nodes.  For  each  node,  the  initial 
transmission  probability  p°k  is  uniformly  distributed  in  [0, 1]  and  a*,  is  uniformly 
distributed  between  5  and  10.  The  stepsize  in  the  gradient  play  is  7  =  0.02.  Fig. 

6.6  compares  the  trajectory  of  the  transmission  probability  updates  in  both  Al¬ 
gorithm  6.1  and  6.2  in  a  single  realization,  under  the  assumption  that  node  k  can 
perfectly  estimate  the  probability  rijejc\{fc}(^  ~Pj)i  ^  e  The  best  response 
update  converges  in  around  8  iterations  and  the  gradient  play  experiences  a  more 
smooth  trajectory  and  the  same  equilibrium  is  attained  after  35  iterations.  In 
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20 

Stage 


Figure  6.6:  Dynamics  of  Algorithms  6.1  and  6.2. 

addition,  to  illustrate  how  individual  nodes  can  adaptively  adjust  their  algorithm 
parameters  and  improve  their  throughput,  we  simulate  a  scenario  with  two  traffic 
classes.  Each  traffic  class  consists  of  5  nodes  and  the  initial  algorithm  parameters 
of  class  1  and  2  are  0i  =  30  and  02  =  60,  respectively.  The  discount  factor  in 
Algorithm  6.3  is  5  =  0.05.  The  blue  dotted  curve  in  Fig.  6.7  indicates  that 
the  operating  point  moves  towards  the  red  Pareto  boundary  until  the  outer  loop 
detects  that  the  desired  efficiency  is  reached. 

In  practice,  packet  transmission  over  wireless  links,  e.g.  IEEE  802.11  WLANs, 
involves  extra  protocol  overheads,  such  as  inter-frame  space  and  packet  header. 
Assuming  these  realistic  communication  scenarios,  we  compare  various  perfor¬ 
mance  metrics,  including  throughput,  fairness,  convergence,  and  stability,  be¬ 
tween  our  proposed  conjecture-based  algorithms,  the  P-MAC  protocol  in  [QS02], 
and  the  IEEE  802.11  DCF.  To  evaluate  these  metrics,  the  physical  layer  parame¬ 
ters  need  to  be  specified.  In  the  simulation,  we  assume  that  each  wireless  device 
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Figure  6.7:  The  trajectory  of  Algorithm  6.3. 


operates  at  the  IEEE  802.11a  PHY  mode-8  [80299],  and  the  key  parameters  are 
summarized  in  Table  6.4.  We  assume  no  transmission  errors  and  the  RTS/CTS 
mechanism  is  disabled.  The  aggregate  network  throughput  can  be  calculated 
using  Bianchi’s  model  [BiaOO] 


T  = 


PxLn 


n 


D  \rT1 


(6.56) 


where  Ps  =  J2n=i  \Pn\-Pn  ■  (1  -Pn)|fn|  1  •  n is  the  probability  that 
a  transmission  occurring  on  the  channel  is  successful,  Ptr  =  1  — n^=i(l  —  Pn)^  is 
the  probability  that  at  least  one  transmission  attempt  happens,  Ts  is  the  average 
time  of  a  successful  transmission,  and  Tc  is  the  average  duration  of  a  collision. 
The  detailed  derivation  of  Ts  and  Tc  using  the  given  network  parameters  in  Table 
6.4  can  be  found  in  [QS02,  BiaOO].  The  parameters  in  P-MAC  are  set  according 
to  [QS02],  The  contention  window  sizes  in  the  IEEE  802.11  DCF  are  CWmm  =  16 
and  CWmax  =  1024.  In  Algorithm  6.3,  individual  nodes  monitor  the  aggregate 
throughput  to  determine  whether  to  adjust  the  parameter  a^.  The  numerical 
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Table  6.4:  IEEE  802.11a  PHY  mode-8  parameters 


Parameters 

Value 

Duration  of  an  Idle  Slot  ( Tsiot ) 

9  ns 

Duration  of  PHY  Header  ( TPHY ) 

20  ns 

SIFS  Time  (TSifs) 

16  ns 

DIFS  Time  (TD1FS) 

34  ns 

Propagation  Delay  (T(j) 

1  ns 

MAC  Header  ( Lmac ) 

28  octets 

Packet  Payload  Size  (Lj) 

2304  octets 

ACK  Frame  Size  {Lack) 

14  octets 

Data  Rate  (Rt) 

54  Mbps 

results  are  obtained  using  a  MAC  simulation  program  in  [BiaOO] .  Our  comparison 
results  are  summarized  as  follows. 

First,  the  throughput  of  the  three  algorithms  is  compared.  We  vary  the  total 
number  of  nodes  K  from  4  to  50,  in  which  \K/ 2]  nodes  carry  class-1  traffic 
and  the  remaining  nodes  carry  class-2  traffic.  The  positive  weights  of  class-1 
and  class-2  are  xi  —  1  and  X2  =  0.5.  The  initial  parameters  in  Algorithm  6.3 
are  chosen  to  be  <f>i  =  3 K/xi  and  02  =  3 K/xz-  As  shown  in  Fig.  6.8,  both 
the  conjectnre-based  algorithm  and  P-MAC  significantly  outperform  the  IEEE 
802.11  DCF.  The  IEEE  802.11  DCF  achieves  the  lowest  throughput,  because 
the  lack  of  adaptation  mechanism  of  the  contention  window  size  causes  more 
frequent  packet  collisions  as  the  number  of  nodes  increases.  Surprisingly,  the 
performance  of  the  conjectural  equilibrium  attained  by  Algorithm  6.3  achieves 
the  maximum  achievable  throughput.  It  also  outperforms  P-MAC,  because  P- 
MAC  uses  approximation  to  derive  closed-form  expressions  for  the  transmission 
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Figure  6.8:  Comparison  of  the  accumulative  throughput  in  the  IEEE  802.11  DCF,  P-MAC, 
and  conjecture-based  algorithms.  Error  bars  correspond  to  the  standard  deviation  of  the  mean 
of  the  100  measurements  sampled  at  each  point.  The  error  bars  in  the  remaining  figures  are  as 
in  this  figure. 


probabilities  of  different  traffic  class. 


Next,  we  evaluate  the  short-term  fairness  of  different  protocols  using  the  quan¬ 
titative  fairness  index  introduced  in  [QS02] 


F 


_ Xn) _  7  j- 

M Vx«)  +  ff(W 


(6.57) 


in  which  Tk  denote  the  throughput  of  node  k  that  belongs  to  traffic  class  n,  and 
fj,  and  a  are,  respectively,  the  mean  and  the  standard  deviation  of  Tnlxn  over  all 
the  active  data  traffic  flows.  We  simulate  a  transmission  duration  of  3  minutes. 
The  stage  duration  in  Algorithm  6.3  is  set  as  50  successful  transmissions.  As 
shown  in  Fig.  6.9,  we  can  see  that  Algorithm  6.3  and  P-MAC  are  comparable 
in  their  fairness  performance  and  the  achieved  fairness  index  is  always  above 
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Figure  6.9:  Comparison  of  the  achieved  fairness  of  the  IEEE  802.11  DCF,  P-MAC,  and 
Algorithm  6.3. 

0.95  regardless  of  the  network  configuration.  On  the  other  hand,  the  fairness 
performance  of  802.11  DCF  is  much  poorer  than  the  previous  two  algorithms 
because  the  DCF  protocol  provides  no  fairness  guarantee. 

Last,  in  order  to  compare  the  convergence  and  the  stability  of  different  proto¬ 
cols  for  time- varying  traffic,  we  simulate  a  network  in  which  the  number  of  active 
nodes  fluctuates  over  time.  In  order  to  cope  with  traffic  fluctuation,  we  slightly 
modify  the  outer  loop  in  Algorithm  6.3.  Once  some  nodes  join  or  leave  the  net¬ 
work  (this  can  be  detected  either  by  tracking  the  contention  signal  nfcgyc(l  —  Pk) 
or  estimating  the  total  number  of  nodes  in  the  network  [BT03] ) ,  the  adaptation 
of  is  activated.  Specifically,  if  more  nodes  join  the  network,  a*,  <—  0^(1  +  5), 
otherwise,  a*,  <—  afc(l  —  5).  At  the  beginning,  \T\\  =  JF?  =  25.  At  stage  200,  15 
class-1  and  15  class-2  nodes  join  the  network.  These  nodes  leave  the  network  at 
the  400th  stage.  The  algorithm  parameter  a*,  is  updated  every  5  stages  and  the 
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Figure  6.10:  The  dynamics  of  the  transmission  probabilities  in  P-MAC  and  Algorithm  6.3. 

stepsize  in  the  gradient  play  is  7  =  0.003.  Fig.  6.10  and  Fig.  6.11  show  the  vari¬ 
ation  of  the  transmission  probabilities  for  both  traffic  classes  and  the  expected 
accumulative  throughput  over  time.  P-MAC  does  not  converge  due  to  the  lack 
of  feedback  control,  which  agrees  with  the  observation  about  the  instability  of 
P-MAC  reported  in  [CCL08].  In  addition,  the  optimal  transmission  probabilities 
computed  by  P-MAC  and  the  conjecture-based  algorithms  are  different  under 
the  same  network  parameters  because  of  the  approximation  used  in  P-MAC. 
As  shown  in  Fig.  6.10,  nodes  deploying  P-MAC  transmit  with  a  higher  prob¬ 
abilities  than  the  conjecture-based  algorithms,  which  creates  a  more  congested 
environment.  As  a  result,  the  accumulative  throughput  achieved  by  P-MAC  is 
slightly  lower  than  the  optimal  throughput.  In  contrast,  the  conjecture-based 
algorithms  enable  the  nodes  adaptively  tune  their  parameters  to  maximize 
the  network  throughput  while  maintaining  the  weighted  fairness  as  well  as  the 
system  stability.  As  shown  in  Fig.  6.10  and  Fig.  6.11,  during  stage  [200,300]  and 
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Figure  6.11:  The  dynamics  of  the  accumulative  throughput  in  P-MAC  and  Algorithm  6.3. 

[400,470],  both  the  best  response  and  the  gradient  play  autonomously  adapt  their 
parameter  a k  until  it  converges  to  the  optimal  operating  point.  As  discussed  be¬ 
fore,  the  best  response  learning  converges  faster  than  the  gradient  play  learning. 
To  give  a  quantitative  measure  of  the  stability,  the  standard  deviations  of  the 
expected  accumulative  throughput  in  Fig.  6.11  for  different  algorithms  satisfy 
&(Tp*MA^) /a(TcBPected)  ~  7  and  the  actual  achieved  accumulative  throughput 
satisfy  Q^p-mac)I  °^CBUal)  &  2.  We  can  see  that,  thanks  to  the  inherent 
feedback  control  mechanism,  both  conjecture-based  algorithm  exhibit  superior 
stability  performance  than  P-MAC. 

We  also  simulate  the  evolution  trajectory  of  the  transmission  probabilities 
of  the  proposed  Algorithm  6.1  and  the  algorithm  in  [MHC09].  Both  algorithms 
are  essentially  the  best-response  based  algorithms.  Specifically,  we  consider  a 
network  with  K  =  6.  The  peak  data  rates  for  different  nodes  are  rq  =  6,  rq  =  36, 
r3  =  9,  r4  =  12,  r5  =  18,  and  r6  =  54,  all  in  Mbps.  We  apply  the  algorithm  in 
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Figure  6.12:  Comparison  between  Algorithm  6.1  and  the  algorithm  in  [MHC09]. 
[MHC09]  to  solve  the  following  network  utility  maximization  problem: 

max  Y'  w-1—  [r'kPk  TT  (l-pj)]1^,  (6.58) 

peP  z '  l  —  ex  -L- 

&E/C  j€fc\{k} 

in  which  a  =  2.  The  optimal  solution  corresponds  to  the  belief  configuration 
ai  =  2.03,a2  =  3.93,03  =  2.32,04  =  2.55, 05  =  2.97,  and  a§  =  4.74.  The  trajec¬ 
tory  of  both  algorithms  are  shown  in  Fig.  6.12.  We  can  see  that,  both  algorithms 
converge  very  fast  and  oscillate  around  the  neighborhood  to  the  optimal  solu¬ 
tion  after  several  iterations.  However,  as  we  discussed  before,  the  algorithm  in 
[MHC09]  requires  individual  nodes  to  decode  all  the  received  packet  headers  and 
estimate  the  transmission  probabilities  of  the  other  nodes  individually,  which  in¬ 
troduces  a  great  internal  computational  overhead  when  the  network  size  grows 
large.  I11  contrast,  nodes  deploying  Algorithm  6.1  only  have  to  estimate  the 
probability  of  having  a  free  channel  without  the  need  of  decoding  all  the  packets, 
which  substantially  reduces  their  computational  efforts. 
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Figure  6.13:  Cumulative  distribution  function  of  p(JBR)  in  ad-hoc  networks. 

We  simulate  the  performance  of  the  proposed  algorithms  in  an  ad-hoc  network 
contained  in  a  100m  x  100m  square  area.  Nodes  in  the  square  area  are  placed  in 
the  random  manner.  Two  nodes  can  interfere  with  each  other  if  their  distance 
is  no  more  than  40m,  i.e.  Dth  =  40m.  We  simulate  three  scenarios  with  the 
node  numbers  K  =  {10,20,40}.  The  Pareto-efficient  point  that  we  select  is 
the  associated  operating  point  with  the  link  weighted  vector  —  1, 1  <  k  < 
K/2 ,  and  0.5,  K/2  <  k  <  K  in  (6.53).  We  can  see  from  Fig.  6.13  that,  p( JBR)  <  1 
holds  for  all  the  simulated  topologies.  As  shown  in  Fig.  6.13,  in  some  realizations, 
p(JBR)  =  1,  and  hence,  the  associate  operating  points  are  not  asymptotically 
stable.  This  will  occur  when  two  nodes  interfere  with  each  other  and  they  do 
not  interfere  and  are  not  interfered  by  the  remaining  nodes  in  the  entire  ad-hoc 
network.  On  the  other  hand,  the  stability  improves  as  the  number  of  nodes 
increases.  As  long  as  the  density  of  nodes  is  sufficiently  large,  the  stability  of 
the  conjecture-based  algorithm  on  the  Pareto-efficient  operating  point  can  be 
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Figure  6.14:  Transmission  probabilities  of  Algorithm  6.1  and  the  IEEE  802.11  DCF  in  ad-hoc 
networks. 


achieved.  Fig.  6.14  and  Fig.  6.15  show  the  evolution  of  transmission  probabilities 
and  accumulative  throughput  for  the  IEEE  802.11  DCF  and  Algorithm  6.1  in  a 
10-node  ad-hoc  network  with  a  randomly  generated  topology.  The  trajectory  of 
the  IEEE  802.11  DCF  is  obtained  using  the  model  in  [LTH07].  The  parameter  a*, 
in  Algorithm  6.1  is  chosen  to  be  \0^\.  The  intuition  behind  is  that,  if  \0^\  =  0, 
node  k  can  transmit  at  the  maximal  probability  without  interfering  with  any 
node.  On  the  other  hand,  if  \0^\  is  large,  node  k  should  backoff  adequately  such 
that  the  reciprocity  can  be  established.  As  shown  in  the  figures,  Algorithm  6.1 
converges  faster  and  achieves  higher  throughput  than  DCF.  Similar  results  have 
been  observed  in  the  other  simulated  topologies. 
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Figure  6.15:  Accumulative  throughput  of  Algorithm  6.1  and  the  IEEE  802.11  DCF  in  ad-hoc 
networks. 

6.6  Discussions 

6.6.1  Comparison  between  Type  I  and  Type  II  games 

The  properties  of  Type  II  games  have  been  investigated  in  the  previous  chap¬ 
ter.  Table  6.5  summarizes  some  similarities  and  differences  between  both  types 
of  games.  First,  the  two  algorithms  exhibit  different  properties  under  the  best 
response  dynamics.  In  Type  I  games,  the  stable  CE  may  not  be  globally  con¬ 
vergent.  However,  the  local  stability  of  a  CE  implies  its  global  convergence  in 
Type  II  games.  Second,  it  is  shown  that  any  operating  point  that  is  arbitrarily 
close  to  the  Pareto  boundary  of  the  utility  region  of  Type  I  games  is  a  stable  CE. 
Similarly,  the  entire  Pareto  boundary  of  Type  II  games  is  also  stable.  At  last, 
different  relationships  between  the  parameter  selection  and  the  achieved  utility 
at  equilibrium  have  been  observed  for  the  two  types  of  games.  In  particular,  in 
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Table  6.5:  Comparison  between  Type  I  and  Type  II  games. 


Games 

Best  response 

dynamics 

Stability  vs. 

efficiency 

Fairness  vs. 

parameter  selection 

Type  I 

local  stability 

global  convergence 

stable  at  near- 

Pareto-optimal  points 

Vjn  OC  Tn/ \n 

Type  II 

local  stability 

global  convergence 

stable  at 

Pareto  boundary 

UJn  —  7n/ 

Pareto  boundary 

Type  I  games,  user  fc’s  utility  Uk  is  approximately  proportional  to  the  inverse  of 
the  parameter  A&  in  its  belief  function.  In  contrast,  in  Type  II  games,  if  the  CE 
is  Pareto-optimal,  as  addressed  in  Remark  5.5,  the  ratio  Tk/Xk  coincide  with  the 
weight  Uk  assigned  to  user  k  in  the  proportional  fairness  objective  function.  In 
other  words,  based  on  the  definition  of  proportional  fairness  [Kel97],  we  know 


(I  *  \ 

<  o 


fc=i 


hut 


(6.59) 


in  which  (u\,  u'2, . . . ,  u'K)  is  the  users’  achieved  utility  associated  with  any  other 
feasible  joint  action  and  (u\,  u2,  ■  ■  ■ ,  u*K )  is  the  optimal  achieved  utility  for  prob¬ 
lem  (5.8)  with  uk  =  Tk/Xk  and  =  1- 


6.6.2  Pricing  Mechanism  vs.  Conjectural  Equilibrium 

In  order  to  achieve  Pareto-optimality,  information  exchanges  among  users  is  gen¬ 
erally  required  in  order  to  collaboratively  maximize  the  system  efficiency.  The 
existing  cooperative  communication  scenarios  either  assume  that  the  informa¬ 
tion  about  all  the  users  is  gathered  by  a  trusted  moderator  (e.g.  access  point, 
base  station,  selected  network  leader  etc.),  to  which  it  is  given  the  authority  to 
centrally  divide  the  available  resources  among  the  participating  users,  or,  in  the 
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distributed  setting,  users  exchange  price  signals  (e.g.  the  Lagrange  multipliers 
for  the  dual  problem)  that  reflect  the  “cost”  for  consuming  per  unit  constrained 
resources  to  maximize  the  social  welfare  and  reach  Pareto-optimal  allocations.  As 
an  important  tool,  the  pricing  mechanism  has  been  applied  in  the  distributed  op¬ 
timization  of  various  communication  networks  [CLC07].  However,  we  would  like 
to  point  out  that,  the  pricing  mechanism  generally  requires  repeated  coordina¬ 
tion  information  exchange  among  users  in  order  to  determine  the  optimal  actions 
and  achieve  the  Pareto-optimality.  In  contrast,  for  the  linear  coupled  games, 
since  the  specific  structure  of  the  utility  function  is  explored,  the  CE  approach 
is  able  to  calculate  the  Pareto  efficient  operating  point  in  a  distributed  manner, 
without  any  real-time  information  exchange  among  users.  In  fact,  the  underly¬ 
ing  coordination  is  implicitly  implemented  when  the  participating  users  initialize 
their  belief  parameters.  Once  the  belief  parameters  are  properly  initialized  by 
the  protocols,  using  the  proposed  dynamic  update  algorithms,  individual  users 
are  able  to  achieve  the  Pareto-optimal  CE  solely  based  on  their  individual  local 
observations  on  their  states  and  no  message  exchange  is  needed  during  the  con¬ 
vergence  process.  Therefore,  the  conjecture  equilibrium  approach  is  an  important 
alternative  to  the  pricing-based  approach  in  the  linearly  coupled  games. 

6.7  Concluding  Remarks 

In  this  chapter,  we  propose  distributed  solutions  that  enable  autonomous  nodes 
to  improve  their  throughput  performance  in  random  access  networks.  It  has  been 
observed  in  the  context  of  random  access  control  that  a  tragedy  of  commons  might 
take  place  if  nodes  behave  selfishly  and  myopically.  Hence,  we  investigate  whether 
forming  internal  belief  functions  and  learning  the  impact  of  various  actions  can 
alter  the  interaction  outcome  among  these  intelligent  nodes.  Specifically,  two 
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distributed  mechanisms  are  proposed  to  dynamically  update  individual  nodes’ 
transmission  probabilities.  It  is  analytically  proven  that  the  entire  throughput  re¬ 
gion  essentially  consist  of  stable  conjectural  equilibria.  In  addition,  we  prove  that 
the  conjecture-based  approach  achieves  the  weighted  fairness  for  heterogeneous 
traffic  classes  and  extend  the  distributed  learning  solutions  to  ad-hoc  networks. 
Simulation  results  have  shown  that  the  proposed  algorithms  achieve  significant 
performance  improvement  against  existing  protocols,  including  the  IEEE  802.11 
DCF  and  the  P-MAC  protocol,  in  terms  of  not  only  fairness  and  throughput  but 
also  convergence  and  stability.  A  potential  future  direction  is  to  investigate  how 
to  detect  and  prevent  misbehavior. 
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CHAPTER  7 


Conclusion 

This  dissertation  illustrates  the  optimal  strategies  for  users  to  improve  their  per¬ 
formance  in  an  informationally  decentralized  multiuser  communication  environ¬ 
ment.  First,  we  propose  and  investigate  a  new  game  model,  which  we  refer  to 
as  additively  coupled  sum  constrained  games,  in  which  each  player  is  subject  to 
a  sum  constraint  and  its  utility  is  additively  impacted  by  the  remaining  users’ 
actions.  We  derive  sufficient  conditions  under  which  a  pure  unique  NE  exist 
and  best  response  dynamics  converges  globally  and  linearly  to  the  NE  without 
any  real-time  information  exchange  among  users.  Second,  we  investigate  the 
multi-channel  power  control  game  to  understand  how  to  further  improve  the  per¬ 
formance  of  inefficient  NE  in  ACSCG.  Specifically,  we  consider  game  theoretic 
solutions  Stackelberg  equilibrium  and  conjectural  equilibrium.  It  is  shown  that  if 
a  foresighted  leader  can  explore  the  inter-user  coupling  by  considering  the  utility 
structures  and  model  its  own  state  (i.e.  experienced  interference)  as  a  function 
of  its  own  action,  a  leader  forming  proper  conjectures  can  improves  both  its  own 
utility  as  well  as  the  utilities  of  its  competitors,  even  if  it  has  no  a  priori  knowledge 
of  their  private  information.  Third,  we  propose  and  investigate  linearly  coupled 
games  in  which  users’  utilities  are  linearly  impacted  by  their  competitors’  actions. 
For  linearly  coupled  games,  we  prove  that  if  every  user  is  playing  the  CE  strategy 
with  appropriate  belief  configuration,  the  entire  system  can  be  induced  to  a  stable 
Pareto-optimal  operating  point  without  exchanging  any  real-time  information  . 
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